
Google Needs Sex - McKittrick
http://krugman.blogs.nytimes.com/2011/01/10/google-needs-sex/
======
jerf
Here we see a search engine in his native habitat, foraging the fields of the
world wide web for his sustenance, advertising targets. These beautiful
lumbering beasts play an important part in the ecosystem, processing great
quantities of flora and excreting them into little packets for the creatures
further down the chain.

Master of his niche and with no natural predators, the mighty search engine's
greatest threat is the ongoing habitat destruction by Mankind, as slash and
burn development techniques replace the verdant fields of pages with cheap
plastic knockoffs and gaudy littering, which the mighty search engine
haplessly consumes, but grows fat and slow from gorging on the empty calories.
If this practice is not soon ended, the search engines may have to be placed
on the endangered species list.

But hark! What's that sound? A female search engine in the throes of heat has
entered the clearing! Her mating call reaches deep into the soul of the male
search engine... "ACK kwire! ACK kwire!" and the male must have her. He
unfurls his spectacular page index count to impress the female, and truly this
is a virile male, for his index count easily reaches into the trillions! A
rare specimen indeed. The female is impressed and approaches the male to
exchange their algorithmic details. Soon the deed is done and satiated, the
male wanders off in disinterest. Soon four or five baby search engines will be
born, each of which to thrive in the harsh environment of the web must
eventually kill and consume their parents, a difficult feat that no baby
search engine has accomplished in many years, but such is the harsh reality of
the untamed wilderness.

Tune in next week when we'll follow the adventures of the cutest little baby
search engine as he grows, encounters his first clickbot, and acquires his
first distracting side business, all before facing his first mortal threat of
acquisition by the trophy-seeking megacorporate hunter who wishes to gut the
young search engine and turn it into the something to hang on his intranet to
impress his fellow megacorporates.

~~~
donaq
Bravo! You, sir, are an artist.

[Edit] What I originally wanted to comment on is that the metaphor this
article uses seems invalid. Google is no stationary target. I presume the
engineers working there are not just sitting around getting massages all day
long.

------
mikeklaas
This is quite possibly the dumbest article about technology I've seen in a
mainstream publication.

What Google _really_ needs is a willingness to accept a way higher threshold
of false negatives in weeding out content. I'd love to have a "known good"
version of Google that risked leaving out some content. Let's start by banning
all .info domains along with any that include a hyphen.

~~~
w1ntermute
> This is quite possibly the dumbest article about technology I've seen in a
> mainstream publication.

Come on, it's in the NYTimes & by Paul Krugman (an economist) - not bad for
that combination. After reading lots of articles by technologists, you need to
remember to temper your expectations when you visit the website of a
mainstream publication.

That said, his theory isn't entirely false - Google may actually need some
"outside" ideas to mix in with their own. Then again, perhaps (my instinct
says probably) they've been doing that for years, and the spammers are just
evolving faster.

~~~
scottw
You mean _former_ economist. He's now a columnist and since seems to have
abandoned most of his critical thinking skills.

------
melvinram
What the author is saying, as I understand it, is that Google needs a way to
mix up their algo's so they aren't easily "gamed" by scammers and spammers.

The question is not whether Google is doing this actively but how fast and
decisively they do this as well as what specific issue they deem as priority
to address.

Google's algo, as I understand it, is an array of knobs that are turned up and
down to increase or decrease impact of various factors such as domain age,
incoming links, quality of content, etc. Google is always adjusting these
knobs and adding/removing knobs. The goals of adjusting these knobs are
obviously only known by Googlers but what is clear to many is that the sole
object of adjusting these knobs in not to make the results pages more
relevant. Don't get me wrong... they definitely care about relevancy (or they
wouldn't have the trust of millions) but they are a public company with
obligations to meet so they must take many factors into account when making
adjustments.

The point? The sex (aka variation over time) that the author refers to will
likely come from adjusting existing knobs and introducing new knobs such as
social authority, locality, etc... but it doesn't necessarily mean that
problems of relevance as we see it will get fixed through these variations.
Google will take into account a number of factors when deciding what problems
they want to solve and to what extend.

</rambling>

~~~
chrisbroadfoot
> what is clear to many is that the sole object of adjusting these knobs in
> not to make the results pages more relevant.

You're implying there is a sole objective. There isn't. And if there was one,
it would most certainly be to maximise relevancy.

------
gregpilling
Great headline, crappy article. People have been trying to game search engines
since a few minutes after they were invented. And other search engines don't
appear to be better, or the market share would have shifted by now. It costs
nothing to the consumer to switch, and people talk about which websites they
like - so if something that is more appealing to Joe Public comes along, it
will be noticed. Bing's ascent in the number of searches has not been huge
yet. Personally I hope for more than one search superpower in the future.

~~~
oniTony
It costs nothing to you and I to switch search engines. The cost is a lot
greater for the kind of people who type URLs into Google's search field, to
navigate the intertubes.

------
sorbus
One could, in theory, manually create a database of a few million (or way
more) websites, and rate the content in each of them (including advertisements
- just how the page is showed to the end user). This database could then be
used to train algorithms, with genetic fitness based on how close the ratings
of the algorithm are to the human ratings of value (note that the algorithms
would not be aware of the human value ratings).

A second - perhaps slightly smaller - database could then be used to test the
performance of the best algorithms in the "real world," or at least on data
which they weren't trained on. This would select against algorithms which are
adapted solely for the first database. Content ratings generated
algorithmically could then be used to modify the ranking of websites in the
results, penalizing websites that seem to have bad content.

I'm not sure how I got here from thinking about sex between search engines; I
suppose it's because one way to deal with the evolution would be taking the
best few algorithms each time and combining them (which I'm sure that there
are issues with). Of course, people far more intelligent than I have certainly
had this idea before, and probably figured out why it wouldn't work (or,
alternatively, that it would work, which would suggest that someone is now
busy implementing it).

~~~
beoba
The catch is this database would need to be kept away from spammers, lest they
be able to test their site designs against it directly.

~~~
sorbus
Not so much the database as the algorithm; if it's sufficiently understood (as
seems to be the case with Google's algorithm, or at least a lot of people
claim it to be the case), then spammers can target it directly. Merely having
the original data used to train it doesn't give much insight into the
algorithm itself.

On second thought, though, being able to identify common characteristics of
the least spam-like websites would allow spammers to mimic those
characteristics. It would take a lot of effort (figuring out the core bits),
but they are clearly willing to put that in. So yes, I suppose that you're
right.

------
cabalamat
If I was a web search company, I'd allow users to upvote or downvote their
search results (this would increase or decrease their prominence on subsequent
searches). This could be done on a per-site, or per-page basis.

Then I'd use one person's preferences to alter how other people receive search
results (on an optional basis; if people didn't want their results filtered
like this, they wouldn't have to.)

But I wouldn't just use an average of all users; it'd be too easy for spammers
to create fake accounts to upvote spam. No, a user's search results would only
be affected by what their friends upvote and downvote (or possibly their
friends of friends as well).

This would make it in a user's interests to link to their friends and have
their friends rate websites, as everyone uses web search. So people would want
to promote the search engine to their friends.

To give people more of an incentive to proselytise the search engine, I'd add
social features such as a twitter-like service (allowing public, friends-only,
and named-recipients-only messages), chat (text, voice and video-voice), an
extended-length messaging service (you could call it, I don't know, a "blog"
or something) that also allows pictures, and RSS feeds of one's own and one's
friends' public entries.

I'd also add a "fan" feature (intransitive, as opposed to friendship which is
transitive). People could create lists rating websites, and others could fan
those lists.

Maybe someone like DuckDuckGo might want to implement something like this?

(Incidentally, DDG market themselves as being privacy-friendly, which coupled
with the recent subpoena of Wikileaks' Twitter data, suggests there may be an
opportunity for a competitor to Twitter that is more privacy-minded).

~~~
dennisgorelik
"I'd allow users to upvote or downvote their search results" Google already
did that. But for some reason they were unable to efficiently use results of
that project. I think something is broken in Google's mid-management.

~~~
natrius
I'd guess that the clickstream data can yield conclusions that are nearly as
accurate as manual voting, if not moreso. It's not worth cluttering the
interface for data with such low marginal value.

~~~
dennisgorelik
Using clickstream data for defining ranking is a must, but not enough. Manual
rating adds value to web site rankings. Google Toolbar's data adds even more
value to the ranking. I'd say all that clicking data should be affect search
rankings even more than hyperlinks.

------
jakeg
Google already has "search engine sex", it's called revision control (although
I really doubt this is what Krugman had in mind.) To solve Google's spam
problem, they either need better business priorities or smarter engineers,
depending on which popular explanation of Google's spam problem is accurate.

> And the most persuasive answer, as I understand it, is defense against
> parasites.

Probably more likely the general ability to merge in one generation two or
more highly advantageous adaptations into one individual, which could include
parasite defense but also everything else.

------
tapiwa
I have been thinking a lot about this problem. If indeed it is a problem.

Google is in the business of serving ads. The vast majority of those spammy
sites display google ads. So, there is little incentive for google to change
things just yet. Joe public is not complaining, yet. It is only the digerati,
and a whole bunch of other webmasters who think their sites should be ranking
higher because _they_ are just better, who seem to be up in arms over this.

The interesting thing is that half the spammy do provide content that is 'just
good enough' for what most people are looking for. The quickest exit from one
of these sites, is via a google ad, so google wins, the site wins, and the
advertiser targetting a specific niche wins.

The only problem will be when the advertisisers stop getting the bang for
their buck when their ads are displayed on these sites. Until then, Google has
little incentive to change.

~~~
tapiwa
Ahh, I forgot to add the 'solution'.

The 'sex' bit, will be marrying google search results with other sources, and
breeding a search ranking algorithm specific to you.

For example, the Facebook 'like' button could turn out to be very useful.
Stuff liked by more people gets ranked higher. Since webmasters are already
gaming Facebook though, then stuff liked by people who are one or two degrees
of separation from you on facebook could count more. Any social network that
allows you to indicate some sort of trust in an individual could work.

I bet you though, that if Google did this, the very same people complaining
about the bad results now, would be the first to complain about the privacy
implications of all this, and how google seems to know so much about us ....
_sigh_

------
dennisgorelik
The author is dead wrong about benefits of sex. The key benefit of sex is the
ability to preserve huge gene pool and try out multiple combinations. The
battle against parasites does not worth huge reproductive complexity that sex
introduces.

On the other hand the author is right that Google is inefficient in fighting
web spam.

~~~
dnautics
yeah, especially considering that a huge chunk of species on this planet
reproduces, clonally, just fine.

------
michaelcgorman
I think he's suggesting a genetic algorithm... Google-style organisms,
competing for fewest retried searches, most-different front-page results,
etc., that over time merge and split in various ways. Seems pretty similar to
some of the tactics Google's already trying to me.

------
ivankirigin

      I’m not quite sure what search-engine sex would involve. 
      But Google apparently needs some.

Wow, Krugman spouts gibberish when he isn't talking specifically about
economics. This post is meaningless.

------
robg
"I’m not quite sure what search-engine sex would involve. But Google
apparently needs some."

Picture a duck (<http://duckduckgo.com/>).

~~~
nostrademons
Eww.

~~~
mattblalock
I said to myself, whatever, DuckDuck's the shit. Then I searched a keyword
relevant to my industry, the first five results:

niche.gr niche-factory.fr cn-niche.com myspace.com/russianniche niche.kz

Right, then. Guess the problem isn't specific to Google...

------
andrewljohnson
Tech journalist to the rescue.

------
dnautics
"Why doesn’t nature just engage in cloning?

... If each generation of an organism looks exactly like the last, parasites
can steadily evolve to bypass the organism’s defenses."

Why doesn't nature, indeed?

<http://en.wikipedia.org/wiki/Asexual_reproduction>

------
dotBen
this article screams of linkbaiting _(I guess it worked, it got onto HN and I
checked it out)_.

Author concludes that "Google needs sex" because cloning avoided parasites
(not strictly true, but whatever) but then signs off with "I don't know what
Google sex looks like".

Duh.

------
invisiblefunnel
Binglehoo

~~~
obluda
g + m$ = BOOM

