
Algorithmic search is sinking - McKittrick
http://www.skrenta.com/2010/11/algorithmic_search_is_sinking.html
======
jeremydavid
"The only way to combat this and return trust and quality to search is by
taking an editorial stand and having humans identify the best sites for every
category."

There are billions of webpages. Who is going to do this review?

Is someone honestly going to review
[http://stackoverflow.com/questions/4300234/how-might-
union-f...](http://stackoverflow.com/questions/4300234/how-might-union-find-
data-structures-be-applied-to-kruskals-algorithm) and put it in the category
of "How Union/Find data structures can be applied to Kruskal's algorithm?"?

No.

The closest thing to a editorialized web is www.dmoz.org, and that hasn't been
properly updated in years (and never will be) because it failed.

Search has to be done with algorithms - there are just too many search queries
to do it any other way. Udi Manber, Google’s VP of Engineering stated that
20-25% of all queries made each day have never been seen before:
[http://www.readwriteweb.com/archives/udi_manber_search_is_a_...](http://www.readwriteweb.com/archives/udi_manber_search_is_a_hard_problem.php).

~~~
evgen
_The closest thing to a editorialized web is www.dmoz.org, and that hasn't
been properly updated in years (and never will be) because it failed._

And noting circular irony one often sees, Rich Skrenta created a Yahoo knock-
off in the bubble days called NewHoo and then sold it to Mozilla where it
became the seed of dmoz...

~~~
btilly
Well then he has been at this for a lot of years, and perhaps now knows how to
do it. :-)

------
tptacek
... says the guy with the (crowdsourced) curated search engine.

~~~
prakash
pretty much anyone that has switched over to some other search engine as their
primary did so because Google's algorithmic search is sinking, for the past
couple of years.

~~~
JoachimSchipper
Given
[http://www.comscore.com/Press_Events/Press_Releases/2010/4/c...](http://www.comscore.com/Press_Events/Press_Releases/2010/4/comScore_Releases_March_2010_U.S._Search_Engine_Rankings),
I don't think there are many people who use a "non-algorithmic" search engine
- or do you count Bing?

------
DrJosiah
Use sentiment analysis to discover the intent of a link, and whether the
destination should get more link juice. Positive sentiment: positive link
juice. Negative sentiment: zero link juice.

Alternatively, for negative reviews, etc., use rel="nofollow".

To claim that algorithmic search is dead completely ignores the _volume_ that
Google is doing, or the fact that they are making $billions in algorithmic
search and Ad placement. How much do curated places make?

Also, not to rain on anyone's parade or anything (just kidding, I'm going to
rain it down) it would take decades of 10k people churning through pages to
get even 1% of the _new_ content that Google discovers _daily_.

You all saw the 24 hours of unique video uploaded to YouTube every minute of
every day figure from a year or two ago, right? Imagine that, only text, and
produced by 10x-1000x as many people at 10-1000x the volume posting to forums,
newsgroups, social networking sites, blogs, etc., every minute of every day.
Because of this, you can't just review a site, you have to review the content
on each page of the site. That's going to kill any curated engine in the long
term.

------
jules
Ironic that he's proposing to have people solve a problem that arose because
people were being manipulated, of course your people cannot be manipulated.

No, the solution to this problem is that GetSatisfaction et al use
rel=nofollow. It's as simple as that. And arguably Google could improve its
algorithm by taking negativity into account.

~~~
McKittrick
yes, this is all getsatisfaction's fault. surprised nytimes missed that angle.

~~~
wallflower
Their response:

[http://blog.getsatisfaction.com/2010/11/28/when-
businesses-a...](http://blog.getsatisfaction.com/2010/11/28/when-businesses-
attack-their-customers/)

~~~
prodigal_erik
see also <http://news.ycombinator.com/item?id=1948934>

------
Vivtek
I have heard about search engine spam before and sort of discounted it - but
you know, if you search on something that's not a technical topic or something
equally specific like a band name, that is, you're searching on a general
topic that is of interest to the mundanes, then there really is a whole lot of
spam on Google.

My case from this week was that I wanted plans for a bookcase. I searched,
therefore, on "build a bookcase". There was exactly one useful link on
Google's front page (a Popular Mechanics link), and the rest were regurgitated
spam that I could improve on with a Markov chain algorithm.

I've read that as long as people click on ads, Google has no motivation to
clean up spam, but surely this can't be the best even for Google?

~~~
TheCoreh
That's strange. My Google results page for "build a bookcase" shows 8 high
quality tutorials on how to build bookcases, besides two pretty decent video-
tutorials.

~~~
smackay
I concur. I got to the second page before I found anything other than a first
class result (incidently it was how to build a bookcase in 5 minutes which
seems to be a dubious proposition at best). Now if you looking for information
on something you actually want to buy then yes, google's results are indeed a
sea of spam.

------
mixmax
Or maybe our algorithms just aren't good enough.

Suppose you use bayesian filtering on the text surrounding the links to
determine whether the connection is good or bad. With a reasonable amount of
data it should be possible.

 _Note:_ I'm not an algorithms guy, I do business and strategy and a wee bit
of programming, so maybe the example isn't good, but I thinkthe point is.

~~~
ddemchuk
Google already analyzes backlinks in their context to determine how relevant
the anchor text is to the topic of the page.

Determining sentiment (the topic of the NYT piece) is considerably harder
though, because it would allow for spammers to write negative articles about a
site and link to it and negatively affect its rankings. Also, determining the
tone/emotions of a piece of text is probably one of the hardest things to do
with textual analysis

~~~
alextgordon
_Determining sentiment (the topic of the NYT piece) is considerably harder
though, because it would allow for spammers to write negative articles about a
site and link to it and negatively affect its rankings._

This could be solved by making sentiments act as a weight (i.e. a multiplier
in [0, 1]). Positive sentiments would give a particular reference more weight,
negative sentiments would give little to no weight. Then it would be
impossible to negatively affect a site's rankings - only positively affect
them. Just like now.

------
jeffmiller
The core problem with having humans identify the best sites is that it doesn't
scale. It's probably ok for big topics like travel or healthcare, but it
shafts those users who are searching for long tail topics.

------
idheitmann
The mystery of the PageRank algorithm is not only a defense against gaming,
it's a defense against competition. Other than stylistic differences (a la
Bing), it seems difficult to differentiate a new service when nobody
understands the details of the standard one.

As a net addict, I regularly find myself frustrated because I can't figure out
how to get meaningful information out of Google instead of sites trying to
sell me. And if I can't think off the top of my head of a website that will
act as a relevant portal for that kind of info, then there isn't really any
alternative to Google.

At least, not that I know of yet: can anyone suggest one?

Google has done amazing things for our ability to get what we want and fast,
but it also is slowly eroding our independence from it and our ability to
educate ourselves by other means.

Here's hoping they prove worthy stewards once they own all the information on
the planet.

------
zmmmmm
An awful lot seems to be getting made out of this one story, and there's
really precious little else cited in the post other than generic claims of
gloom and doom about search. Google's been fighting spam sites for a long time
before this and the battle certainly waxes and wanes but I'm sceptical that
it's actually being lost, it's just a constant struggle.

Now if you tell me that there is value in social search we could have a
totally different discussion, but it's more about the persuasive power of
personal recommendation than algorithms not working any more.

------
fonosip
it is sinking, but for a different reason. the web is getting away from
google. getting locked up in apps, or walled gardens like facebook or itunes

~~~
prodigal_erik
This. The open content web is beginning to disintegrate, being displaced by
siloed apps which only incidentally happen to involve HTML and HTTP.

------
jellicle
I don't know if Skrenta's approach is perfect (can spammers make slashtags?
I'll bet they can!) but Google's is clearly failing.

Giant swathes of Google searches are now overrun with datafog spammers. Ehow,
squidoo, hubpages, wikihow, buzzle, how-wiki, ezinearticles, bukisa, wisegeek,
articlesnatch, healthblurbs, associatedcontent - all thee and thousands more
domains filled with spam semi-automatically generated by legions of Indians
for a few cents per page.

There's not one word of useful information on any of those domains. But
apparently they serve a lot of ads for Google, so they don't get delisted.

~~~
sandGorgon
I invoke SandGorgon’s law of outsourcing analogies

 _As an online discussion about PROGRAMMING grows longer, the probability of a
comparison involving outsourcing or Indians approaches 1, if Godwin’s law has
not already been satisfied_

~~~
cosgroveb
I really don't see that many comments about Indians and outsourcing in
programming discussions here. Or maybe I am not noticing them?

~~~
reitzensteinm
Also, the law is pretty much a tautology since it doesn't define a time
scale... eventually, pretty much everything gets said.

"As an online discussion about PROGRAMMING grows longer, the probability of a
discussion of traditional medicine and spiritual beliefs surrounding
childbirth in ancient sub Saharan African tribes approaches 1."

~~~
sandGorgon
AS Godwin's law itself clarifies:

 _Godwin put forth the sarcastic observation that, given enough time, all
discussions—regardless of topic or scope—inevitably end up being about Hitler
and the Nazis._

------
JoachimSchipper
To everyone talking about "sentiment analysis": that's not easy. Sentences
like "John has stupidly said foo[1], and even went so far as to say bar[2]
(which was demolished by Jane[3] and Jan[4]); he's now capitulated[5]" would
be quite difficult to parse. The following articles may also be instructive:
all by the same author, all quite critical, but with links with quite
different intentions.

[http://scienceblogs.com/goodmath/2009/05/dembski_responds.ph...](http://scienceblogs.com/goodmath/2009/05/dembski_responds.php)
[http://scienceblogs.com/goodmath/2009/12/id_garbage_csi_as_n...](http://scienceblogs.com/goodmath/2009/12/id_garbage_csi_as_non-
computab.php)
[http://scienceblogs.com/goodmath/2009/08/quick_critique_demb...](http://scienceblogs.com/goodmath/2009/08/quick_critique_dembski_and_mar.php)

------
maheshs
>>Algorithmic search is sinking

I think we need better Algorithm.

------
ergo98
There is little rigor behind most of the claims of the NYTime story: The
targeted site already negates any pagerank benefit of their links (they do
implement nofollow), and the definitive example seems to be nothing more than
good SEO of the site in question (most of the other front and second page
sites are pretty mediocre as well, clearly with little web competition in the
keyword space).

In any case, go to a shopping specific (sub)site if shopping. A google search
is a terrible way of find either products or retailers.

~~~
klbarry
Note: I do SEO as part of my job, so I know a few tools that can look through
this. Google keyword checker gives 590 searches a month for that phrase, so
it's not too competitive. I'm sure he ranks for a lot of these tail phrases
though.

A lot of his juice comes from every page (seems to be over 10,000 according to
Yahoo Site Explorer) on his site linking with good anchor text to every other
page. The fact that he ranks so low (on my Google he's number 6 or so) even
with this on such an easy term shows something, doesn't it?

------
rorrr
So there's some shitty retailer. What does this have to do with algorithmic
search?

~~~
cantbecool
Well, Google uses algorithms to rank websites; the retailers unprofessional
practices are gaming Google's algorithms by increasing the number of inbound
links pointing to their site, which increases their page rank on a search
return on Google and increases traffic.

