
Why We Desperately Need a New (and Better) Google - vyrotek
http://techcrunch.com/2011/01/01/why-we-desperately-need-a-new-and-better-google-2/
======
danilocampos
An obvious improvement to Google whose absence shocks the hell out of me would
be this:

Personal domain blacklist.

There's a lot of spammy bullshit on the web and Google seems to have given up
on keeping this away from me. Fine. But for my specific searches, there's
usually a handful of offenders who, if I never, ever saw them again, it would
improve my search experience by an order of magnitude.

So let me personalize search by blacklisting these clowns. Why can't I filter
my search results so that when I search for a programming issue, I never see
these assholes from "Efreedom" who scrape and republish Stack Overflow?

I don't, personally, need an algorithmic solution to spam. Just let me define
spam for my personal searches and, for me, the problem is mostly solved.

(Also blacklisted: Yahoo Answers, Experts Exchange.)

~~~
steveklabnik
Just a small note, SO puts all of its content under a Creative Commons
license[1][2], which they are (in my non-lawyerly opinion) following.

Now, this doesn't mean that filtering them wouldn't be useful to you, since at
first glance it appears they're solely a duplicate. Just pointing out that
they're not actually doing anything wrong, and they're (probably) not
scraping.[3]

SO has specifically said that this is okay.[4]

1:
[http://wiki.creativecommons.org/Case_Studies/StackOverflow.c...](http://wiki.creativecommons.org/Case_Studies/StackOverflow.com)

2: <http://creativecommons.org/licenses/by-sa/2.5/>

3: [http://blog.stackoverflow.com/2009/06/stack-overflow-
creativ...](http://blog.stackoverflow.com/2009/06/stack-overflow-creative-
commons-data-dump/)

4: <http://blog.stackoverflow.com/2009/06/attribution-required/>

~~~
danilocampos
> SO has specifically said that this is okay.

It doesn't look like Jeff is _that_ okay with it, especially when it comes at
the cost of Stack Overflow's own ranking:

 _Sorry, this is absolutely necessary, otherwise we get demolished by scrapers
using our own content in Google ranking._ – This is from a question about
Stack Overflow's SEO strategy:

[http://meta.stackoverflow.com/questions/71906/first-tag-
in-t...](http://meta.stackoverflow.com/questions/71906/first-tag-in-the-title-
of-the-page-is-not-that-convenient)

~~~
steveklabnik
Yeah, it's not a _great_ situation. I'd probably filter them out too. But then
again, maybe not. This is a complicated topic, and I'm still not quite
awake...

~~~
danilocampos
I mean, their use of SO's content may be _legal_ but it doesn't mean they
aren't dicks. It's a wholesale ripoff of another site's content that adds
absolutely no value to anyone but the publishers themselves. It inconveniences
users and it harms the good work of Stack Overflow by robbing them of rankings
they deserve.

In the same way I _can_ call someone's mother bad names because it isn't
illegal, it doesn't mean I _should_ do it, because I can follow the letter of
the law 100% and still be an asshole. Overall, my policy is that it's best
_not_ to be an asshole and it annoys me when others can't share that basic
ethos.

~~~
_delirium
The same happens with Wikipedia as well. It's free-content, because that's
sort of the point of the project. And reusing that content is great and
encouraged. But just rehosting the exact contents of en.wikipedia.org with ads
slapped on is a bit lame. Legal, but it's not any sort of interesting reuse,
just adding more noise to the internet.

~~~
jacquesm
Wikipedia gets so much traffic from google that the harm there is minimal
compared to what's happening to stackoverflow.

------
cletus
This issue in a roundabout kind of way touches on Facebook.

The issue of social search has a lot of mindshare. Some think it is the future
of search. I disagree.

One of the things that made search successful anduseful early on was scale.
Instead of having to go to the librar or ask your friends you can effectively
canvas the connected world.

I find the notion that friends' recommendations will replace that as nothing
short of bizarre. It's like a huge step backwards. The argument is that you
can filter out the garbage as your social graph will provide a level of
curation.

Let me give you a concrete example. If I wanted t buy a camera I'd stil need t
go to dpreview and other sites. It's highly likely that my friends don't
really know a lot about this (but some will have an opinion anyway).

This same idea of human curation is behind such sites ad Mahalo and the
garbage sites themselves to a degree. Of course at some point computers will
be powerful enough to generate this garbage content.

Blekko's idea of slash tags s interesting (to a degree) but if it's successful
its easily reproducible. Google is still in the box seat here but of course
that's no barrier to a link-baiting TC title.

Personally I'm an optimist. I believe that, much like email spam, the garbage
from AC, DM and others I'd a transitional problem (email spam is basically a
solved problem now if you use a half-decent email provider). If they succeed
we won't be able to find anything. I don't believe that'll happen so these
services are therefore doomed.

So betting on Demand Media is (to quote Tyler) like betting on the Mayans
(meaning betting they're right about the world ending in 2012: it doesnt
really matter if you're right).

So my money is on Google being the better Google.

~~~
richcollins
How does filtering based on a trust network imply that the network will only
include your friends?

~~~
beoba
Well, at that point, your 'trust network' may as well include the blog/review
site that you would have found through a normal search to begin with.

~~~
richcollins
You would only consider the people you trust on a specific topic when ranking
results (not all backlinks).

------
Matt_Cutts
"Google does provide an option to search within a date range, but these are
the dates when website was indexed rather than created; which means the
results are practically useless."

I believe the author is mistaken on this point. Quick proof is to do a search
for [matt cutts] and you'll see the root page of my blog. Click "More search
tools" on the left and click the "Past week" link. Now you'll only see pages
created the last week, even though lots of pages on my site were indexed in
the last week.

~~~
amichail
BTW, why is it that Google has not opened up its core search engine to third
party developers so that their code can be used to bring up some of the search
results by default (without requiring the user to subscribe to third party
features)?

Most people have never used Google's Subscribed Links feature because it is
not enabled by default.

User feedback can be used to determine which third party code to use in which
contexts. Spam/unhelpful features would be detected quickly. There would be
intense competition among third party developers for highly desired features.

Something like this could give you Wolfram Alpha like features among other
things such as custom UIs for various searches (e.g., travel).

The Google App Engine could be used for computation. You could pay third party
developers by how often their code is used in search results.

~~~
Matt_Cutts
Just speaking for me personally, I'd be a fan of trying ideas like that. The
potential danger would be that some code could introduce latency or other
things that would make the search experience worse instead of better.

~~~
webwright
Isn't that pretty similar to Steve Jobs' original argument against apps on the
iPhone?

~~~
jrockway
It seems more likely that the average developer would introduce latency when
searching through a database of billions of text documents than when making an
application that makes a fart sound when you press a button. Just sayin'.

------
DanielBMarkham
_This is exactly what blogger Paul Kedrosky found when trying to buy a
dishwasher. He wrote about how he began Googleing for information…and
Googleing…and Googleing. He couldn’t make head or tail of the results. Paul
concluded that the “the entire web is spam when it comes to major appliance
reviews”._

So I happen to know somebody who is taking a small section of the home
appliance market and creating content around it -- reviews, news, advice, a
place for other consumers to talk to each other.

Of course to do this you need to have income, so they are going to use some
sort of ad-supported model.

My question is very simple: is their project a spam site or not? To some, I
guess it would qualify. To others, not.

You see, there are two questions when it comes to search results: 1) Am I
being presented results that match the query I entered? and 2) Am I being
presented results that match what I want to know?

These are two entirely different things. A third-grader looking for
information on a movie star might find a games page with all sorts of
information on that star -- all sponsored by some kind of adsensey stuff. And
he's very happy. A researcher typing in the same question gets the same page?
He's pissed.

There is no universal answer for any one question. It's all dependent on the
culture, education, and intent of the user -- all of which are not easily
communicated to a search engine.

Look -- this is a real problem. I hate it. Sucks to go to pages you don't
like. All I'm saying is that it's more complicated than "we need a new Google"
Finding what you want exactly when you want it is a difficult and non-trivial
problem. We just got lucky in that Google found a simple algorithm that can be
helpful in some situations. It may be that we're seeing the natural end of the
usefulness of that algorithm.

</soapbox>

~~~
jshen
you failed to address the main point. Google is filled with duplicate content,
aggregation sites, and content farms with shit content that has the right
statistical profile.

this is not the same problem as different people perceiving results
differently.

~~~
rhizome
Sure, but it can also have to do with different people perceiving _inputs_
differently. While I lament the content duplicators with a passion, I deal
with it myself with longer search queries. Grouping and requiring, excluding
particular domains (like efreedom), and so on. I know we all probably do this
to some extent, but I've found that, sure, you have to exclude 10 different
domains, but it does winnow down the results in a useful search. I'd rather be
able to search with three words, but for now those days are over.

------
replicatorblog
It will be interesting to see how this impacts the Android/iOS battle. Search
revenue funds almost all of Google's other activities so if people start using
other search engines or find alternate ways to get their content it could
impact the level they can spend on phones.

With a push to a mobile first world the Android model is especially sensitive
to spam. On a full size browser you have a lot more context and results for a
given search. 5 Results may be spam, but you can work around them. If the
average phone screen shows 3-5 results and all of them are spam you will
quickly find alternate tools.

Google ignoring spam is like Microsoft ignoring the cloud.

~~~
Matt_Cutts
The article calls out two specific companies as "landfill in the garbage
websites that you find all over the web." Reasonable people can disagree over
whether such content is truly spam or low-quality content, and thus how to
respond.

~~~
replicatorblog
What is the difference and how should they respond? It seems to be a rising
frustration among power users that Google is increasingly becoming a wasteland
populated by spam. For example, Marco Arment recently commented on his podcast
how hard it was to find answers to simple questions on Google these days. He
was saying that the content farms have basically created a page for every PHP
function with thin content and rendered it useless. For a company whose goal
is to index all human information it is a pretty big warning flag.

What is the appropriate user response? Go to Stack Overflow? Find a branded
knowledge base like O'Reilly's Safari? I'm genuinely curious to know what we
can do.

~~~
Matt_Cutts
I was referring to how Google should respond to content farms. Historically,
Google has been willing to take manual action on webspam. With the rest of
search quality and ranking, Google tries to use algorithms as much as we can.
So the distinction of whether something is spam vs. low-quality is an
important one within Google.

~~~
robryan
One of the reasons the searchwiki approach was a good idea. Not everyone has
the same opinion, what one person found helpful another found low quality
content.

------
JusticeJones
Tell me, how exactly is writing a sensationalized article that targets one of
the Internet's oldest and largest communities to get fed by CPM advertising
any different than what they decry? People have said this time and time again,
but they never seem to debut let alone promise any sort of technology to
address the issue. They just leave that end of the deal up in the air. As if
to say that it's o.k. to spin topics as long as they strike a social nerve,
but those who're less graceful at the craft are undeserving of the benefits
which they themselves reap.

If the search giants had any balls they'd cut the "Internet Marketing"
community off at the knees. Because the money making methods pushed by that
community either don't work or are unsustainable, so they're entirely reliant
on a steady stream of new recruits. If they want to promote gaming your system
don't let them reap any benefits from it.

~~~
eitland
What about a three strike approach like the one that was suggested a few days
ago with AdSense?

Domains frequently being excluded by power searchers could be good signal.

(googling to find on that used to pester my search results, kods.net, it seems
it has finally got banned. Hooray!)

------
petervandijck
The argument being that Google is loosing the war against spam. A new and
better Google will likely be Google itself. What we really need is a way to
discover content that's not search.

~~~
meterplech
For many things there is utility to social content finding. Sites like HN work
for news/discussion, and Twitter and Facebook work for a random amalgam of
things. I'm interested in the future of social search startups that somehow
curate content from friends. In the articles example- asking friends if they
liked their Dishwasher, and if yes what brand it is. That's the most like how
people IRL make these decisions. I know there are some startups in this space
as well, hope they do well!

~~~
nhangen
Furthermore, I'd love to see a simple thumbs up or thumbs down system
integrated into the browser or Google search.

Did this page help you find what you were looking for? Was this page useful?

etc

~~~
ehsanul
The problem with that, is again, spam. How do you differentiate fake votes
from real ones? You can do per-IP limiting, and probably other things I'm not
aware of (please comment about them), but it's still pretty open to abuse I'd
say.

~~~
nhangen
I suppose you do it similarly to the way Amazon or iTunes implements it for
reviews...you make sure the user is logged in via a Google account.

Sure, it could be exploited, and I'm guessing that's why they haven't
implemented it, but there's got to be a solution that would make it work.

------
tokenadult
Let me see if I correctly understand the learned professor's article. In his
view, the problem is that a user using a free search engine to find
information will find a lot of information about people who want to sell
products and services, gaining money by exerting their time and effort. What
he hopes to obtain for free is email addresses of persons to whom he wants to
send his survey, so that he can use their time and effort without compensating
them to produce something of value to him. Exactly how is this a problem?

People who actively like to be contacted by random persons surfing the
Internet make their contact information readily available (and answer
questions sent through those publicly visible contact channels). But to many
other persons, not being readily visible on the Internet is a feature rather
than a bug. (Disclaimer: my contact information is readily visible on the
Internet, so readily visible that it has been used by point-of-view pushers on
Wikipedia to give me harassing telephone calls.)

~~~
tedunangst
I think he was actually trying to piece together the background stories for
the people who didn't respond to his emails, not find their emails. I don't
think the info he was searching for was particularly private either.

For instance, trying to find out the company a CEO worked at before their
current one. The problem is that content copiers will produce so many copies
of the PR announcement for their current job, it's impossible to find the
announcement for their previous job. I've tried doing this exact search and
it's very frustrating.

~~~
narrator
I often find that things that really don't exist wind up with spam results.
For instance, if you type in "free ipad" into google you will likely get
thousands of search results, all spam because they don't actually exist.

Similarly, contact and personal information for CEOs of major corporations
does not exist online either and any search will turn up spam.

One more example, to add to the many. If you get a genuine wrong number call
from somebody who made a simple mistake and type their caller id into the
internet, you'll just get a bunch of reverse phone lookup spam while if you
search for a phone number of a know telemarketer or bill collector, you'll
likely get a full dossier on that company.

------
d4nt
It's interesting that the way of "gaming" Google appears to be in having
thousands of people generating SEO friendly content. I think Google's problem
is that it's pushed SEO to the point where the definition of Spam depends
either on a subjective view of what kind of site the user is looking for, or
it's just mildly worse than something else that's out there (e.g. When I
search for something coding related and get one of the stackoverflow
scrapers).

Where do we go from here? Well, I don't think the answer is just a radically
new way of indexing/ranking websites. That might work in the short term but
the spammers will soon catch up. The answer probably lies in a combination of
better language interpretation, context sensitivity using browsing history and
location, and user profiling based on the social graph and search history. All
of which google seems to be working on.

------
buro9
I love Google products, but I can't help but agree. I'm currently trying to
find a colour laser printer that has good performance (quality vs speed) with
a reasonable running cost over the life of the printer (at least a few years).

All I'm getting is either the manufacturers slant (PR) or spam sites all
harvesting the same reviews.

To solve this I now look for vertical based search sites. In this case
<http://www.printershowcase.com/small-officecolorlaser.aspx> is the best I've
found... but it's hardly to printers what dpreview is to cameras.

I stick with Google because it largely works well, but when I know what I want
to see and that it must exist but cannot find it... then I find myself looking
elsewhere all the time. DDG and Blekko I use in these cases, but even they're
not solving these kinds of needs.

~~~
stephenbez
I trust Amazon for my product reviews.

~~~
tedunangst
I've found them nearly useless. Way too many people giving one star reviews
because their 15" laptop doesn't fit in a 13" bag. If you take the time to
read all the reviews, you can weed out the idiots, but sifting out the haters
and the astroturfers can take longer than just going to a store and fondling
the merchandise.

------
didip
I just created a blekko account after reading this article (good job TC! It
works this time.)

They seriously need to hire a capable UX person. The logged-in interface is
full of problems:

* Twitter-like status update. I believe this has nothing to do with search.

* Form with 10+ fields on creating a slashtag. You cannot possibly expect me to enter all domain names I could think of into that tiny <textarea>?

* I finally created /python but I have no idea how to improve or update the slashtag. I cannot update that slashtag from search results page.

Overall, very frustrating experience.

------
ams6110
Why would it be so difficult for Google to filter out spam sites? E.g.
DuckDuckGo filters out eHow.com results, because they are low quality and tend
to be spammy.

Oh of course, it's not in Google's interest to do this, because they make
money from the spam sites. So I don't expect Google to really "solve" this
problem.... their trick is to stay useful enough that users don't abandon
them, but allow enough spam into the search results to provide revenue. A
tricky balance...

~~~
moultano
None of Google's ranking decisions are concerned with revenue from AdSense.
Period.

~~~
CamperBob
And I should believe this because....?

~~~
moultano
I work in Search Quality at Google, and this is stated policy. Revenue _from
any source_ (the ads on the results page included) is not a metric used to
make ranking launch decisions.

If me saying so isn't enough evidence for you, consider that it makes sense.
Google knows that losing the lead in search would be much more damaging then
shutting down _all_ of AdSense.

------
meadhikari
Professor, you could've proved your point by linking to at least one example
of how Blekko found a founders work and listed it by date (as the task
required), instead you have hashtags on health, finance, etc. The truth is
that nobody has arranged that information in the way you want, if it existed
at all, that venture database where you found the 500 companies would've been
the natural place to look.. CrunchBase maybe?!

Thought Worth mentioning

------
jrussbowman
One of the new things I am working on with unscatter.com is getting quicker
access to reviews and blog posts using the blekko api. The next release will
be a major change as I've dumped most of the current search providers in favor
of blekko and have moved realtime search to it's own page with analysis by
providing lists of links in the realtime feed.

Nothing is released yet unfortunately. The site is officially a hobby for me
write more but I hope to have the new stuff up in the next week or two. I may
just hide the realtime stuff and get the blekko feeds up sooner rather than
later.

Now that I am focusing building the site to fit my needs getting up to date
info about products and technology, the bulk of my personal searches, is the
top priority. Have to admit the blekko api has helped.

In the mean time I would suggest the slash tags /reviews and /blogs with /date
on blekko would be very helpful if you are doing product searches. With
unscatter I am really only providing shortcuts for the with additional ui
tweaks.

Disclaimer: I am in no way associated with blekko other than having been given
permission to use their api for a personal project.

~~~
jrussbowman
Sorry meant to say the site is a hobby for me right more. Wrote these comment
with the swype keyboard on my phone so I apologize for any problems reading
them

------
mark_l_watson
I just tried two test queries on blekko and google. Small sample, but there
did seem to be less link-bate results on blekko. The issue is whether their
results are close to being as up to date as google's results.

I was interested that blekko seems to have done a lot with a modest amount of
funding.

Also, I wonder if they are getting some monetization with the association with
Facebook.

------
kokon
CMIIW, but is that the reason why Google acquired MetaWeb a few months ago?
I'm expecting to see some improvement on that front.

------
stcredzero
_He couldn’t make head or tail of the results. Paul concluded that the “the
entire web is spam when it comes to major appliance reviews”._

A simple solution to this: Consumer Reports. A subscription is well worth it!
The likelihood that it will pay for itself in the next year is very high.

~~~
ams6110
Consumer reports are probably about as objective a source as you can find, but
I don't believe they are without their biases. They tend to give a lot of
weight to things like value and reliability, and less on aesthetics, though
the latter may be an important factor for some consumers. For tech products,
the review them from the standpoint of an "average consumer" and probably
won't evaluate factors that matter a lot to many readers here.

They are also politically left-leaning, if that matters to you.

~~~
stcredzero
_For tech products, the review them from the standpoint of an "average
consumer"_

Yeah, don't use them to rate tech products. But as far as coventional
appliances go, they are a very good resource.

------
EGreg
Hey, so what you are basically saying is, "the best computer algorithms in the
world" (you know, Google has like > 578690 Ph. D's) are not good enough to
have effective search, so we should introduce the human element.

Fair enough. There is the Open Directory Project (which is pretty old) and of
course there is Facebook, Twitter, and other, human-curated services. Starting
a whole new company to do search and compete with Google (and Bing)? Seems
like a waste of time as Google can just copy what you are doing and
incorporate it into its already massive site (complete with traffic, audience,
and lots of other goodies). Instead, why not get Google to add more social
recommendation and feedback features?

------
apollo
This may be a bit of a tangent, but I want to see the results of the VC system
survey.

------
kmfrk
How does yegg deal with this on DuckDuckGo? A lot of us use his search engine,
and it's a great one at that, which is not worth forgetting.

~~~
epi0Bauqu
By way more aggressively removing useless sites.

------
oliverdamian
How about a P2P search/bookmarking platform where peers could publish
search/bookmarking histories ranked by like/dislike/spam votes which other
peers can subscribe to. Publishing peers can also be ranked according to
number of subscribers. Actually P2P curation could be the next level up from
raw centralised search. Is there anything like this out there already?

------
Dramatize
I'd like to have the option (like facebook has when you mouse over a post in
your feed) to hide all results from X website.

If you tied that with the ability to follow other people and their search
edits, the number of spammy results could be reduced.

------
DTrejo
<http://duckduckgo.com/> works very well for me.

    
    
      - less spam
      - programmer oriented results, when relevant
      - more legible search results

------
klbarry
Isn't the issue, of course, that spammers have no incentive to game other
search engines since they're not worth the time? Any search engine that gets
big will have the problem.

~~~
bambax
Came here to say this but you said it before! ;-)

Also, "crowd-sourced curated lists of websites" sound like the old Yahoo
directory of yore. They will either become obsolete very quickly or spammers
will find a way to penetrate and dominate them.

~~~
jerf
I believe the Yahoo lists were actually centralized. Hypothetically a fully-
distributed approach could work better, in the style of Wikipedia.
Realistically, well, I have no idea but it's worth a try.

~~~
ehsanul
I agree. One particular try failed [1], but that doesn't mean it can't work.

[1] <http://en.wikipedia.org/wiki/Wikia_Search>

~~~
dasil003
Not only that, but Yahoo didn't think search was important, so it's not even a
real try of enhancing search with curation.

The truth is, if some company comes up with a better search engine, whatever
ideas behind it are not going to sound like an obvious win up front—if they
did then Google would already be doing that. Instead they'll have to create a
search engine that is better, but somehow antithetical to Google's business
model so that they can't just copy it, because there's no way for a startup to
come up with enough resources to stay materially ahead of Google in pure
search. And of course that's only half the battle; then you have to be better
enough that users can be bothered to switch (or a browser deal coup).

Personally I haven't found the spam problem to be nearly as bad as the echo
chamber makes out. I think silicon valley types just have a good imagination
about how good it _could_ be.

