

Google Destroyed the Web - dkasper
http://www.bobpritchett.com/blog/2009/12/google_destroyed_the_web.html
I think the last point is the most interesting. Do we need a search engine that punishes sites with ads to get rid of spam?
======
jacquesm
Google destroyed the web much more by using pagerank than they ever did by
adsense.

The simple cure for MFA sites would be to simply ban those pages that use
google adsense from search engine results pages.

10 minute fix. Tops. But it will also seriously impact googles revenues and
that is why it isn't going to happen.

A really good argument against having too many of these services in the hands
of the same company.

The bigger problem, and one that is much harder to fix is that by using the
structure of the web to measure page popularity google has put a premium on
manipulating the structure of the web for profit.

The damage that has done is a lot bigger than MFA sites, those only affect
search engine results.

It is a logical consequence of the ranking method used by a search engine that
the damage is limited to that which is being measured.

Altavista led to keyword spam which was limited to the pages of the sites
owned by those trying to gain as much traffic as possible from it, Google has
led to link spam for exactly the same reason.

~~~
gojomo
I have long wanted an 'advanced search operator' that excludes sites running
AdSense ads. Similar to the 'site:' or 'link:' operators, we could have 'ads-
from:'. So when I want sites without AdSense, I could add this to my query:

    
    
      -ads-from:google.com
    

(I'm not holding my breath, though.)

~~~
jacquesm
If Microsoft wants to have some real fun with bing they add that option.

I might even switch :)

------
ThinkWriteMute
_There’s an opportunity here to create a web search engine that punishes
results littered with ads. Google can’t do it – they live off those ads._

This is absolute bullshit and it looks like a majority of replies here are
eating it up.

Google actively de-lists pages that aren't content intensive, and I mean
serious legible content.

Research some SEO IM work before you try and suggest that Google does nothing
to remove Ad-sense filled pages. It's this kind of blogging that really
muddies the waters.

~~~
ThinkWriteMute
_Looks like he denied my post directly on the blog. Real big of him._ False
alarm, just slow to approve.

In parting I'd like to say this: _wake up_ , Ad-riddled sites get tossed up in
miliseconds (I know, I've written an engine to do so). You can't _possibly_
make a search engine that can de-list fast enough. SEO IMers will _fucking
break you_ and that's just the way it's going to be unless someone (God
forbid) regulates who can make websites and how fast.

 _post script_ , there are millions, if not hundreds of millions, of legit
websites that use the easy and unobtrusive Ad-sense ads to pay for (sometimes)
very expensive hosting.

You want to talk about destroying the internet? You're describing the
obliteration of (almost) any non-corporate website.

------
Perceval
It might interesting to try and design a reverse-search engine, a kind of
internet spam filter. Instead of being designed to promote relevant content,
it would be designed to explicitly _not_ return crap content.

Rather than trying to improve Google's resiliency against noise, it might be
more fruitful to turn the problem around.

~~~
GHFigs
The answer is people. People you trust to not provide you with crap. Sometimes
that's friends, sometimes that's family, sometimes that's strangers. It's
never social media webcocks who want to "friend you", never bloggers offering
you "n Ways to X your Y" for their $5/mo. in AdSense earnings, and only rarely
the nebulous "crowd".

~~~
ramchip
Not so sure. When I'm looking for information on, say, WWII history, I can't
ask someone or go on a trusted website like when I'm looking for information
on a field I'm familiar with. I don't have any acquaintance or favourite
websites already, which is why I'm using a search engine.

The large amount of [quiz,farm,friend of the day] spam on my facebook, and the
various hoax chain letters my not-too-web-savvy aunt sends us sometimes, also
make me a little jaded on the value of 'social $whatever' for information
gathering.

~~~
GHFigs
_WWII history, I can't ask someone or go on a trusted website like when I'm
looking for information on a field I'm familiar with._

WWII history isn't such a hot topic for crapflooding, either. Search is still
useful there. You're also still applying trust heuristics based on the
results--"battle of the coral sea" returns results from Wikipedia,
history.navy.mil, worldwar2history.info, history.sandiego.edu, etc. You
probably have some idea about the content of each. Ultimately your decisions
about what to click come down to your perception of trust in their authors'
ability to provide you the information you desire.

I'm just _guessing_ , but I suspect most people here are familiar with the
phenomenon of doing a Google search only to end up clicking on the Wikipedia
result that invariably makes it into the top few results. This is a sign that
Google's PageRank still aligns reasonably well with our own estimation of
trust for those search terms.

As for social whatevers: that's why I said "People you trust to not provide
you with crap." You can't trust most people to do that. Least of all when
given a megaphone and/or a financial incentive.

------
pmarsh
If you want less page spam in your results write better queries into the
search engine.

There are still plenty of good results and it's fairly easy to ignore these
SPAM pages just like you ignore banner ads.

If the author said that Google's gotten worse then there might be reason to
worry. But they're constantly improving I wouldn't worry about them getting
beat at this game since they're the ones who control the rules.

------
m0th87
Has the author forgotten how impossible it was to navigate the WWW prior to
Google? They essentially fixed search relative to the competitive solutions at
the time (Lycos, AltaVista... _shudder_ ).

~~~
lmoorman
That's like saying do you remember how hard it was to use a PC before Windows
came around? The problems solved yesterday do not give you a pass today.

~~~
Semiapies
Yes, but anyone saying, "Windows/OS X/Linux/ _your OS here_ destroyed the PC"
is more blatantly trolling than the linked post.

------
bobbyi
_Today I searched for the answer to a question. The top hit was a useful
article written by a subject-matter expert. (Good job, Google.)_

Not really destroyed then.

------
stcredzero
What he wants isn't a search engine that punishes ads. What he wants is really
a search engine that punishes _bad taste_. (Specifically, too many ads
greedily placed and greedily optimized text poorly written.)

I think that such a search engine has tremendous profit potential. Would it be
possible to detect redundancy? Is it possible to detect poor design?

~~~
10ren
Wasn't he was focusing mainly on the result not answering his question, rather
than taste (though both are factors).

At first I was thinking that what you seek amounts to strong AI, but then I
remembered that there are measures for objectively assessing how easy text is
to understand (even Word has some built-in); though, ironically, text that is
understandable by a lower age is usually easier to understand (i.e. better)
than that which is so convoluted that it requires multiple PhDs to parse.

Unfortunately, Google is not finding these metrics for me, because I don't
know what they're called... ah, here we go: Passive Sentences Test, Flesch
Reading Ease and Flesch-Kincaid Grade Level.

~~~
stcredzero
I bet Google could do some simple metrics based on usefulness ratings by users
(vote arrows) and number of ads. I bet they could also develop metrics to
measure the same ratings against "degree of search engine optimization."

------
vaksel
so how exactly does this guy propose content sites get paid if they can't use
advertising to support themselves?

Are they supposed to charge for access to read things that are readily
available online?

~~~
dangrossman
Real content sites aren't the problem, it's the sites that consist of nothing
but thousands of pages of filler text and ads. A lot of them take articles off
"free to reprint" article sites, and run them through programs that rearrange
sentences and swap words with synonyms, to create "unique" content. Once
Google finds the site, they start listing the pages, unable to differentiate
between this garbage and real content.

~~~
vaksel
except Google has a strict policy against sites like that, and smacks them
down the second they find them.

------
johnl
I've noticed the same problem, on some searches the entire first page is pure
junk. Bet 80% of internet users don't know the difference. Maybe a power users
mode that drops those from the search results would work.

