
Google already knows its search sucks (and is working to fix it) - shawndumas
http://venturebeat.com/2011/01/12/google-search/
======
JacobAldridge
I enjoyed that, but I'm not sure about a couple of the claims.

I switched from AltaVista to Google because it gave me better results almost
all of the time - if that weren't the case I, and however many millions of
others, wouldn't have switched. The 'expensive data centre theory' may have
sped-up the demise of AV et al, but I don't think it's fair to say Google
succeeded by having low costs rather than a superior product.

I'd also like to see data to back up the claim that _"The vast majority of
users are no longer clicking through pages of Google results"_. Again, not my
experience, but I recognise that I am a datapoint of one. I do note that
increasingly Google's own answers (especially maps and images) are providing
me with the direction I need in response to a search query, but even then I
usually click through.

Edit: Re-read that. At first I thought it meant clicking through to the pages
that are returned as results; it may mean not clicking through the pages of
Google results (1-infinity below). Still, I thought <5% of people ever clicked
through to the second page (most people refined their search if the first
results weren't what they wanted) so I'm not sure if anything has changed.

I think Google search is damaged, not yet a product failure but not yet "no
longer" a problem.

~~~
jobu
Swu said it best:

"Google lacks a feature that it should have added year ago:

A search user who is logged in should have the ability to block entire domains
from all future results.

The benefits of this are many. The cost is very low.

Why is this option not already available? Google - we depend on you. Do it."

~~~
mistermann
>Why is this option not already available?

I think the answer to this question is the same reason why google's results
have become spammy. Allowing users to exclude specific domains, or even having
a "report as content farm" button, are in direct conflict with google's
business model, to a certain degree.

~~~
gwern
I wonder how much is due to user-friendliness. An invisible global blacklist
on my search results? How could that possibly go wrong...

~~~
quanticle
Well, does it have to be an invisible global blacklist? Would it be possible
to create personal blacklists and have one for each user? I mean, Google
already customizes search results for logged in users, so this wouldn't be too
far of a stretch.

~~~
gwern
> Would it be possible to create personal blacklists and have one for each
> user?

I think that's pretty clearly what we were talking about. There are invisible
global blacklists already, as should come as no surprise (even if you haven't
run into one of the hits omitted thanks to the DMCA).

------
KirinDave
> But the secret to Google’s success was actually not PageRank, although it
> makes for a good foundation myth.

From an algorithmic standpoint, I'm reasonably certain that pagerank wasn't
the primary factor that catapulted Google's results in front of competitors.

I think that Google's algorithmic secret—initially, at least—was to use
inbound link text as part of the index for a page. You don't hear people talk
about it much, but this feature is one of the more powerful (and difficult)
optimizations you can make to a hyperlink database search index.

------
johnyzee
So inbound links can be faked but Facebook likes can't? Please.

Also, Google won because of better search, to such an extent that all they
needed was word-of-mouth to completely demolish the competition in the late
nineties and I don't recall any indications that Yahoo or Altavista couldn't
scale their technology. Back then using Google was simply like using a
different, infinitely more usable internet.

Finally, anecdotally, Google is a lot more difficult to game than any other
search engine. It is pretty clear to me that inbound links carry a lot more
weight with Bing and Yahoo, whereas Google includes several other metrics in
how to weigh search results (including, fairly or not, significant emphasis on
how long a site has been around).

~~~
loumf
Likes can be faked, but what if Facebook only counts likes in my circle -- or
weights them heavily?

The real problem is that likes on pages aren't used enough for them to be
useful.

~~~
coliveira
Until now, you mean. Facebook is training its users to add everything they
like. This counts web pages that you can share, too.

------
simias
I am not a Google fanboy, actually I've been using duckduckgo for a couple of
weeks as my default search engine. However, I've done it for the privacy
features of ddg, not because I think google search "sucks" more than it used
to.

Maybe I'm blind. Maybe I don't pay enough attention. Maybe I just use my
search engine differently than most (or at least than those who seem to
complain a lot lately) but I fail to find a concrete example where google (or
ddg for that matter) gives me results that are obviously "wrong" (whatever
that means).

I think we've all learned to speak "google". When you query a search engine
you don't speak english. You don't ask (or do you?) "I want articles
concerning java on the android platform on the hacker news website". Instead,
you say "site:news.ycombinator.com android java". And most of the time I get
what I want. Maybe I'm more easily satisfied than most. At any rate, these
days it's important to learn this language to effectively use any search
engine.

In this particular article, the author states that searching for "iPod
Connectivity" doesn't yield much results that "actually answer your query". My
question is: as a human being, what kind of results do you expect when you
search for "iphone connectivity"?

I'm not playing dumb. Is there one good way to interpret this query? Are you
shopping? Are you looking for specs? What kind of connectivity are you looking
for anyway?

I did the query just now. The third link targets apple.com, the fifth
amazon.com. The rest is a bunch of websites doing reviews or selling iphone
parts. I can't really judge if those site are legit or not, some do look
fishy. But at any rate, why do you say they are "bad answers"?

I do have my gripes with google. Expertexchange used to be a huge pain in the
ass, but it's mostly gone these days (probably more thanks to stackoverflow
than google, that's for sure). Google has also _never given me any results on
google groups even if it will gladly give me some ad-ridden usenet mirror. I
think Google has effectively contributed to destroy usenet that way.

It seems people want google to answer queries such as "What phone should I
buy?" and have google give the _right_ answer in the first link. It seems some
people took the church of google a little too literally. The day Google knows
how to answer that it will probably follow up by sending a mechanical
Schwarzenegger clone back in time to kill bill gates' mother.

To sum it up: of course Google may want to fight even more the various adfarms
out there, but before you criticize the answer, ask yourself if you've asked
the good question.

~~~
quanticle
_I think we've all learned to speak "google". When you query a search engine
you don't speak english. You don't ask (or do you?) "I want articles
concerning java on the android platform on the hacker news website". Instead,
you say "site:news.ycombinator.com android java". And most of the time I get
what I want. Maybe I'm more easily satisfied than most. At any rate, these
days it's important to learn this language to effectively use any search
engine._

You've learned to speak "Google". The majority of the world hasn't. My mom,
for example, would search for "how to bake apple pie", rather than "apple pie
baking" or any other such set of key terms.

In fact, your approach reminds me of the approach that I used with AltaVista
(back when it was the premier search engine). The reason I used this approach
is because I was taught to do this by one of my teachers. Back then, it was
recognized that search engines don't recognize human language and that search
queries had to be carefully crafted to return optimal results.

Of course, very few people think about that now. No one spends time crafting a
search query - they just type their question into the Google box and click
search. As Google search results decline in quality we might see a
resurrection of the old way of thinking about and crafting search queries to
return the optimal result set.

------
cpr
This article whitewashes the issue by first pointing out the problem, and then
making some vague hand-wavy claim that Google is providing direct answers
immediately that obviate further search.

This is nonsense.

I think the whole world knows that for many kinds of general searches (e.g.,
appliance reviews when shopping), Google has been completely overrun by spammy
content.

I sure hope they're working on something.

~~~
jkent
Can you give specific examples where the search returns a lot of spammy
content?

~~~
bertil
Common examples that are given are consumer goods, like household appliances.
I tried ‘hoover‘, ‘washing machine’, ‘hair dryer‘, and I had relevant results:
deep link into relevant and safe stores, both web-only and the web-arm of
brick-and-mortar. However, I'm searching from abroad (France) and not in
English, obviously.

~~~
pyared
i use thefind.com for product search instead of google. they bought like.com
which will hopefully help them out in this area. peter yared (author of the
article above)

------
DanielBMarkham
I wish Google and the other search engines well.

A couple of minor additions to the article for context based on what little
bit I've been learning:

\- About 25%-30% of Google searches each day are searches that Google has
never seen before

\- While 90% of bad results is just, well, bad results, there is significant
room for interpretation. In the example provided, is there a reason Google
should return first in a search for PageRank? If so, what is it (described
technically, not emotionally)? It may be (but probably isn't) that these other
PR firms are actually honestly more cited than Google is. I'm sure this isn't
the case, but whenever somebody says "And the result wasn't what I liked" I
try to take a careful look at what they are saying. Sometimes it's that they
had an academic reason for the search that wasn't validated. Sometimes it's
that their opinion of what is popular and the rest of the internet's is
different. Most of the time the system is gamed, yes, but there are times in
which the author is just expressing an emotional dislike of the results in
good/bad terms. Search isn't something that has a "right" result. You are
either generally kinda pleased with it or you aren't. (In fact, blind studies
show other search engines consistently scoring higher than Google, but when
the participants knew it was Google, then Google scored higher. There is a lot
of room in this topic for personal opinion and stupid human tricks)

Since it's all so much based on human reactions, what could happen is that
Google could fix the problem and nobody would notice. Or they might not fix
the problem and everybody thinks they did. It's all about perception.

I've tried DDG and Bing, and I'm still with Google. At least for now.

~~~
jedsmith
> About 25%-30% of Google searches each day are searches that Google has never
> seen before

Can you cite that? That's very interesting.

~~~
DanielBMarkham
I read it in "The Art of SEO" which I just finished.

I believe it was sourced as part of a speech from a Google VP sometime in 2007
or 2008.

What's happening is that the scammers are training the users to use longer and
longer search queries. It's much harder to trick a system that is working off
of 6 keywords than it is a system that is only using 2. Long-tail stuff
continues to get more and more important.

~~~
dhimes
Interesting-- see this thread <http://news.ycombinator.com/item?id=2099774>

~~~
DanielBMarkham
I wrote a longer reply, but the more I write the more I realize how ignorant I
am. I'm out of my depth. Beats me.

I think we can agree that if it is true that 25% of all searches are
completely new to Google, it's not like they could have been gamed. Can't game
a search that has never existed before. Right?

~~~
byrneseyeview
I've optimized for queries that have never happened before. A big fraction of
that 25% comes from hyper-specific queries from a known pool of terms. One
could optimize for, e.g.:

[size] [color] [quality] widgets

And have a page that ranked for:

Brobdingnagian Fuchsia Middling Widgets

Even if that term had never been searched. You wouldn't have to try too hard--
your bespoke widget company could just list all of the types of widgets it
could potentially manufacture.

------
brudgers
> _"Google is in the unique position of being able to learn from billions and
> billions of queries what is relevant and what can be verticalized into
> immediate results."_

That's the problem with search in a nutshell. I don't want what Google thinks
is relevant, I want what I think is relevant. As the article points out, for
searches which might be monetized, Google treats monetization as a highly
relevant factor (unsurprisingly since the relevance of search terms to the
advertising they sell is the basis of their business).

In other words, over time Google has worked hard to "curate" search results
(even if the curation of links is primarily done in bulk rather than more
selectively) so that results are tied to your geographic location (e.g.
"football stadium" is likely to return vastly different links in the US
compared to the rest of the world.)

Localization and monetization go hand in hand. For example "weather" provides
a generic local forecast with options for detailed forecasts from three
commercial sites: Weather Channel, Weather Underground, and Accuweather. But
tellingly does not provide a link directly to a NOAA local forecast which
contains the most uptodate complete and reliable information even though
providing such a link is trivial.

Yes Google looks at billions of searches, but in order to monetize those
searches to the greatest degree possible. Their analysis is to see what the
traffic will bear, not to make the results more relevant to the user.

------
krosaen
"It’s a popular notion these days Google has lost its “mojo” due to failed
products like Google Wave, Google Buzz"

Really? Taking a single one sided tech crunch article (that is in contrast to
several other recent TC articles marveling at innovations produced by google
such as the real time translation) as popular belief didn't make me want to
continue reading.

But I did. Some interesting points about the vertical results being important,
but the implication that the core results are a wasteland that google has
ceded seems pretty unsubstantiated.

~~~
axod
Yeah it's convenient to not mention the massive successes like Chrome and
Android...

~~~
dcreemer
Though these products/ projects are certainly doing very well and look
promising, I'll call them "massive successes" when they start to make
significant contributions to Google's profit or even at least revenue.

~~~
Matt_Cutts
It's not always about profit and revenue. By your definition, Apache and Linux
could not be considered successes.

Personally, I think gaining 10-15% of the browser/smartphone market in a
couple years is pretty good. And while the direct success of Chrome has been
nice, it's also been a kick in the pants for the entire browser industry,
which has responded with better, safer, and faster browsers for everyone.

------
stcredzero
Here's an easy place to start: detection of plagiarism. I think Joel Spolsky
said something about SEO spammers copying content from stackoverflow then
editing it to be search engine optimal and posting it without a link back. It
seems feasible to detect this situation and actually exploit it!

It should be possible to 1) extract the optimizations and make them available
to the original site and 2) start bringing the legal hammer down on the SEO
spammers for violating the terms of service of sites like stackoverflow.

------
comex
A lot of this is nonsense.

> If you search for any topic that is monetizable, such as “iPod Connectivity”
> or “Futon Filling”, you will see pages and pages of search results selling
> products and very few that actually answer your query.

I suspect that most people searching for those things _want to buy them_. I
Googled "iPod Connectivity" and the results (Amazon, Consumer Reports, Apple)
seemed like a good selection of links for someone who wants that.

> Case in point: The Google.com page that describes PageRank is #4 in the
> Google search results for the term PageRank, below two vendors that are
> selling search engine marketing.

Actually, the google.com page that's number 4 (for me) is very vague about
PageRank (essentially useless for someone who wants to learn about it). The
three links above it are a PageRank checker, Wikipedia's description of
PageRank, and an article about how to optimize your PageRank. While the last
article is somewhat scummy, there is a good chance that optimizing PageRank is
_what the searcher was looking for_.

> The vast majority of users are no longer clicking through pages of Google
> results: They are instantly getting an answer to their question:

These kinds of "vertical search results" only appear for very simplistic
queries; while it's handy to be able to instantly find out the "sf weather",
anyone wanting even slightly more specific information needs to contend with
the blue links.

I doubt they're "clicking through pages" (i.e. going to the second page and
beyond), but that's hardly an indictment of the quality of the blue links--
rather the opposite.

------
6ren
> But the secret to Google’s success was actually not PageRank, although it
> makes for a good foundation myth.

I do love the story of _algorithm wins_ , but is there any evidence at all
showing how important PageRank was/is, either way? I think it is hard to be
sure exactly why something is popular, even when it's happening, so I suspect
there isn't much evidence either way. There's two parts to the issue:

    
    
        (1) did PageRank give "better" results, for what users wanted?
        (2) was this an active factor in their preference for google?
    

I've heard the argument that the speed of results is a very significant factor
(as the article claims), and that was definitely a factor for me. Also, I
recall research from google showing the dramatic effect on user satisfaction
from even slight differences in latency (above a perceived-as-instantaneous
threshold).

Another factor at the time was that google didn't have paid ranking, so search
results were better in this sense - related to this is a psychological trust
issue, which would make people feel more comfortable with google, even if the
improvement in search results was insignificant. You didn't have suspect the
results.

Possibly the crucial factor (at that time) was that everyone else had crowded
homepages; whereas google was simple and sparse and just did search.

Both the last two have been eroded, as the competition copied google. One
would think they could also catch up on latency - it's certainly easier for
any search engine that processes fewer queries. Google retains the advantage
of familiarity, which surprisingly is one of the strongest competitive
advantages for _consumer_ goods, where the technology doesn't change much (eg.
gum and cola).

So... provided google keeps up with the competition technically, it will
trounce them commercially.

------
jkent
I switched from AltaVista because Google.com was faster and less full of spam.
I still think that is the case compared to other search engines for common
searches.

That being said, more and more searches are using ever increasing numbers of
search term words, and that can get spammy.

Google make no secret of investing in search and the presence here of Matt
Cutts is testament to the fact that we(?) are listening, with very frequent
algorithmic tweaks and responses.

I don't feel it's useful comparing Facebook's "like" system with a search
engine. Whilst it is useful to know what friends think of things, this can
also be gamed by smart marketers.

I don't think it sucks - but there are use cases and perhaps for certain users
where the experience indeed sucks. I for one have been using the 'spam flag'
extension for Chrome and feel that something can be done with the results.
Perhaps this will help?

(my opinions are my own and not necessarily that of my employer).

~~~
dhimes
_more and more searches are using ever increasing numbers of search term
words, and that can get spammy_

Why is that? I thought more words would give more focussed, less spammy
results.

On another point it seems to me that Google would have to tread carefully here
if the set of spammers has a large intersection with the set of folks who buy
advertising from Google. I would (however naively) suspect this might be the
case.

~~~
_delirium
> Why is that? I thought more words would give more focussed, less spammy
> results.

I've noticed that if you use too many words, you more often end up with
keyword-stuffed or aggregated/scraped pages, because they just happen to use
all the words in your search query on the same page. Sometimes this is because
there aren't actually any real results for the particular narrow niche you
wanted, but sometimes it's just because none of the real results use all of
the keywords you tried verbatim, and Google wasn't able to figure out that
those non-verbatim matches were more relevant than the exact-match-but-crap
pages (admittedly a hard problem).

~~~
dhimes
Ah, I see what you mean. These link farms are getting clever, and using
content "real" enough to pass the automated sniff test.

~~~
_delirium
Part is also what counts as spam. Google doesn't count a lot of index-type
pages as spam, so if you search for a conjunction of a few programming terms,
[scala foo bar baz], you often get a page that indexes blogs on a topic, in
this case Scala. The post titles on that page will use all of your search
terms, but often in _different_ post titles, effectively erasing the
conjunction operator in your query.

------
chopsueyar
Alta Vista would consistently show adult websites as the top link for most
search results.

As far as Facebook goes, the 'Like' button is great for things of a viral
nature and trending topics, but over the long term, I have my doubts it can be
used as a search engine.

Prove me wrong, Zuck!

------
petervandijck
"The vast majority of users are no longer clicking through pages of Google
results" -> That doesn't sound right.

~~~
gaiusparx
Maybe it means most people will give up instead of trying to click through too
many pages of results. Its might be more effective to go to specific sites
sometimes, example Wikipedia for general info, imdb for movies, stackoverflow
for programming topics, amazon.com for product review.

~~~
franze
stack overflow (+the other stack exchange sites) have 88.2% traffic referred
from google search. <http://www.codinghorror.com/blog/>

so even if it is a nice story ("direct traffic is becoming more and more
important") it is ultimately just b _ullsh_ t

------
xster
"But Google’s fixing it." What are they doing exactly? Don't believe it's
mentioned in the article.

------
sdh
"Over the past couple of years, Google has progressively added vertical search
results above its regular results. When you search for the weather,
businesses, stock quotes, popular videos, music, addresses, airplane flight
status, and more, the search results of what you are looking for are presented
immediately. The vast majority of users are no longer clicking through pages
of Google results: They are instantly getting an answer to their question:"

Is this Google fanboy speak? Sorry, but I don't care how quickly the gamed
results appear or how local they are, they still don't answer my questions.

This article is pointless.

------
dmethvin
> Facebook, which can rank content based on the number of Likes from actual
> people rather than the number of inbound links from various websites, can
> now provide more relevant hits, and in realtime since it does not have to
> crawl the web.

Unfortunately, the Like button means too many things. It doesn't just mean you
like the content on that one page, it means you like the creator of that web
site enough to let them put updates in your news feed. And it means you don't
mind telling all your friends you like it.

------
Shorel
For any given search the user can type a few words, click on a few results,
and going back to the search results until he finds what he wants.

He can also change the words in the search in the process.

This last click when the user stops searching is, for this particular user,
the most relevant result for any of the search strings he has used.

I don't know any search engine that uses this data.

------
watty
I'd say it's a stretch to say "sucks" but yes, spam needs to be filtered.

------
ot
> The much acclaimed PageRank algorithm,

I stopped reading here. Really, do people still believe that PageRank has any
significant weight in the complex ranking machinery of web search?

~~~
ot
Just to clarify, because of the downmods.

Search engines rank the pages according to a variety of (hundreds) "signals",
some of which are global (PageRank, ...), some "local" (local link structure,
page quality, ...) and some measuring the relevancy of the page for the query
(BM25, ...).

All these signals are blended together usually with a function which is
regressed using some machine learning algorithm, to fit human judgements and
(possibly) click data. Look for "learning to rank" on Google Scholar for
details, there are papers by Google, Yahoo and Microsoft.

Google itself admits that PageRank is not very relevant
([http://sites.google.com/site/webmasterhelpforum/en/faq--
craw...](http://sites.google.com/site/webmasterhelpforum/en/faq--crawling--
indexing---ranking#pagerank)):

> [...] worry less about PageRank, which is just one of over 200 signals that
> can affect how your site is crawled, indexed and ranked.

------
RudolphHalmo
The only thing you want to give Me is whats on your plate, how about
unambiguous neutrality!

------
xyzzyb
Too late, I've already switched to DuckDuckGo.

------
aneth
What a shamelessly self-promotional piece. It starts out pointing out how
everyone else is slow for not noticing this as soon as he did, and concludes
by reminding you that he was the very first to make fun of Google. Is no one
else bothered by this? I find it hard to read without cringing because his
goal is obviously to write history with him at the center, not to make any
interesting point.

Furthermore, the central claim is a massive reference-less overstatement that
I think is almost certainly false:

> The vast majority of users are no longer clicking through pages of Google
> results: They are instantly getting an answer to their question:

