
What you could do if you were Google and had their databases - jacquesm
http://www.jacquesmattheij.com/What+you+could+do+if+you+were+google+and+had+their+databases
======
ot
This is a very good analysis, and I would like to add some historical
information to put it in context.

Originally, link analysis was based on a simple observation: people were
maintaining curated lists of bookmarks in their personal webpages, because
before search engines it was the only way to remember relevant webpages; I'm
sure that most people remember that every personal site had a "my bookmarks"
webpage. Therefore, every link counted as a "vote" from that person to a page.

Altavista exploited this by ranking by the number of inlinks. It wasn't long
until people figured out that this is very easy to game, so someone came up
with the idea of "transitive influence": a site is influent if another
influent site votes for it. Algorithms such as PageRank and HITS solved the
spam problem.

However, with the evolution of the web, the meaning of links changed. For
example, most links are automatically generated by CMSs. Also, there is more
and more ephemeral information on the web, and for ephemeral information as
soon as you have enough links to it to evaluate its quality, the information
is already old and irrelevant.

Luckily, if you are the most used search engine, you have another important
popularity signal, which is given by your users: the number of clicks to a
page for a given query.

In fact, in today's search engines I would say that the influence of link
analysis is smaller and smaller in the overall ranking, and probably done
mostly at the domain level rather than for each single page.

However, with the advent of social networks, a large amount of clicks doesn't
come from search engines anymore, but from social "shares". Which means that
the search engine can not observe them anymore, losing a precious signal of
popularity (together with "Like"s and "+1"s).

I'm not sure if this is what Google is after with Google+, but most probably
the click and share data on Google+ affects (or will affect) the search
rankings.

------
Estragon
I recently attended a talk at Cornell by a guy who is due to start working at
Google soon. He had not started working there, but presumably he was hired on
the basis of the work he described, and he was explicitly interested in
applying his framework to the Google social network.

He was working on a scheme to get people to recommend things to their friends
in return for discounts on those things. His central focus was on devising a
payout scheme in which people's economically optimal strategy is honestly
reporting the cost of making the recommendation, and then choosing the optimum
set of people with whom to make this deal for a given budget. Optimum in the
sense of influencing the most people with the recommendations.

~~~
anonymoushn
Is there a reasonable means to prevent people from spamming their friends to
get the discount and then saying over some more trusted channel "Please
disregard the spam, I haven't tried this thing and don't know whether it is
good yet?"

~~~
InclinedPlane
That's small compared to the problem of arbitrage in such a scheme. Start by
spamming people to buy something which gives them the largest dollar value
discount on an item they don't actually want, then sell part of the discount
to someone who wants it and pocket the profit.

------
bencpeters
Definitely makes some sense. I don't know that this is actually a bad thing
though. I personally care more about quality (read: relevant) search results
than the details of how they are created. If Google is able to use a social
network like Google+ to legitimately improve their search offering so that I
have an easier time finding something relevant and interesting using their
search engine when I need it, then I'm ok with that.

Of course, this is predicated on the assumption that google is able to
effectively use Google + to tweak their main algorithms. I don't want to be
barraged with direct G+ links in my search results - I want the relevant
websites themselves. Furthermore, there should be a way to opt-out if people
are uncomfortable with the lack of privacy, even if the default is their
current privacy policy.

------
3pt14159
Not sure it would be legal, but I would put a significant amount of devs onto
predicting stocks. A blog goes more silent or people start Googling for "what
is fraud, exactly?" from IPs known to contain lots of IBM workers (that would
be more on the illegal end for sure).

Obviously a lawyer would need to review every input, but they have access to
machine learning and data so much so that they could rock the exchanges.

~~~
mrich
I recall Eric Schmidt saying they thought about this, but the lawyers told
them it would be illegal so they didn't explore it further.

~~~
3pt14159
At what point does it become illegal though. Surely if Goldman can trade then
there must be some aggregation that they are allowed to trade at. "Debt
consolidation" search trends for example. If it spikes in a certain country
then they would be able to make decisions on that.

Seems like so much untapped potential.

------
dustingetz
I think this is what jaques is trying to say:

page-rank's design is open to being gamed which has led to an arms race of
algorithm tweaks. A social graph has much more powerful signals for figuring
out relevant search results.

this isn't a hypothesis. the first iteration is called "search plus your
world" and everybody already knows about it.

~~~
euroclydon
Google, in many cases because of GA, knows a users behavior once they reach a
site. I wonder if they take into consideration, how long a user stays on a
site they find in search or how active they are at that site, when ranking
search results? Wouldn't these metrics be an indicator of quality?

~~~
vibrunazo
According to Dan Sullivan's periodic table, they do measure that. And it has a
weight of "2" according to his model. See the "Ce" tag:

<http://searchengineland.com/seotable>

------
chmike
I also believe that this is one of the potential benefit of G+. Though this
can also easily be rotten by SEO if they use fake people or mechanical turcs.
This may explain the effort to enforce identifying the real person behind an
account. Facebook is going for it and google too. And this just to be able to
throw bigger shovels of ads to us.

------
leejw00t354
Interesting, but I don't think Google wanted to go social to decrease the
damages of link spamming.

People already do social network spamming. There are plenty of sites where you
can pay for a certain number of +1's or likes.

Using a combination of inputs, social interaction, page links, keywords, they
can achieve a better overall ranking algorithm, which is probably one reason
they wanted to go social.

However I think the main reason Google wanted to go social was to be able
categorise their users better. The more they know about their users, the
better they can serve ads.

~~~
pavel_lishin
> People already do social network spamming. There are plenty of sites where
> you can pay for a certain number of +1's or likes.

Not all +1's are created equal. A +1 from a close friend, or a respected
public figure is worth a lot more than ten thousand +1's from accounts created
in the past 48 hours from a Bangalore IP address.

~~~
lurker14
But in Google's framework, the only kinds of +1 are "your immediate friends"
(those appear with profile pictures alongside) and "all the rest" which appear
in the aggregate total "822,251 people +1'd this"

------
kiloaper
>What you could do if you were Google and had their databases?

Try to predict election results and then try to influence them. Not exactly
"Do no evil" I'll admit. Imagine what Google PR could do for politicians with
access to what people are talking about, watching, browsing and chatting about
in their social network. With their Ad technology they're already half way
there.

~~~
pavel_lishin
> Try to predict election results and then try to influence them. Not exactly
> "Do no evil" I'll admit.

Depends on who they're backing, right?

But even then, that's subjective. What's more evil, abortion or forced
transvaginal ultrasound? Pollution, or unemployment? Kang or Kodos?

------
ridiculous_fish
It may be that the web has rotted in the way described, so that it takes
longer for users to find what they want. But I would argue this dynamic
actually benefits Google: searching takes longer, so users spend more time
using Google's services.

So there's this perverse incentive to make search as crappy as possible, right
up to the limit where users switch to another search engine. And the great
thing about this "web rot" (from Google's perspective) is that it affects all
search engines, so users won't switch.

I wouldn't go so far as to say that Google is intentionally pursuing this
"crapification" strategy, but they may not worry about the web rot as much as
the author thinks, at least in the short term.

But while decreasing search result relevance may be neutral or even
beneficial, increasing ad relevance is a big benefit for Google. And that is
where social comes in: it allows for more targeted ads.

~~~
eurleif
You're assuming total search volume will stay constant regardless of quality.
But if people don't expect to find things of value as easily, they won't
search for as many things. Not worth sorting through a bunch of crappy
results.

I'm sure Google knows that its value is based on the Web's value.

~~~
ridiculous_fish
True. There's a Laffer Curve of Search with a revenue-optimizing level of suck
somewhere between 0% and 100%.

------
cr4zy
This post keys into an idea I had recently to combine Klout with
anonymized/aggregated web history data (i.e. from a browser extension) to
build a URL quality/trending score. Google could obviously do this with
Chrome, but I'd like to see a more open method of distributing this data to
allow anyone to build on top of it. In the same vein, I'd like to see an open
service that websites could ping when they want a page to be indexed. This
would allow for a more open and efficient crawling ecosystem than currently
exists. In general, development and ownership of search and social should
become more open and distributed just as telephony, electricity, operating
systems, and so many more large industries before it.

~~~
amitamb
> In general, development and ownership of search and social should become
> more open and distributed just as telephony, electricity, operating systems,
> and so many more large industries before it.

I am working on an effort to do that called VerticalSet. It is a search
engine/platform to let developers change search experience the way they want
while giving users best search experience.

<http://www.verticalset.com/>

~~~
cr4zy
Cool! So an open front end to search. This is why I love HN.

------
alexhaefner
This article goes along with, "What you could do if you were Apple and had
their cash", and other greats like, "What you could do if you were a genie".

Just saying.

~~~
micaeked
nope

------
bisserlis
> people that relied on links will now rely on your search engine reducing the
> value of links between sites

Is this really a given? I've never heard this before, not sure what it is that
it means. Is the number of links generated by non-SEO sources falling (or not
rising proportionally with SEO links)?

------
praptak
Alternative answer to the question posed in the title: blackmail people by
threatening to disclose their porn viewing habits (the search engine) to their
friends and relatives (the social network.)

------
gcmazza
I would do behavior based marketing that was relevant to individuals and / or
segments ... it would be an opt-in but manufacturers would be really
interested in providing great offers if they could reach opinion leaders and
certain segments to gain incremental business. Would need to be opt in as I
say so its not intrusive but really beneficial

------
brianfryer
> a number of guide users were followed closely

No. No following people around on the Internet to make search results
"better".

------
j_s
Sorry, I missed what the article was trying to say... am I correct that the
TL;DR version is "social search" ?

~~~
cwe
Nope, the TL;DR version is "turn G+ users into a better crawler/algorithm"

~~~
j_s
Thanks; I guess I see that as the same thing (guessing that Google sees social
search as an improvement).

------
brown9-2
Reminds me of Cory Doctorow's (fictional) short story Scroogled:
[http://www.radaronline.com/from-the-
magazine/2007/09/google_...](http://www.radaronline.com/from-the-
magazine/2007/09/google_fiction_evil_dangerous_surveillance_control_1.php)

~~~
jrockway
The story where it's Google's fault that a new government took over that made
pretty much every past, present, and future action illegal.

------
jdc
Facebook uses their social graph to figure out what sort of needs you have.
Maybe Google could try the reverse process; that is figuring out your social
graph using their knowledge of what web content you need. Kinda evil though.

------
cloudwalking
I presume this gives me access to all of their servers too. In which case I
would mine the remaining Bitcoin. Mainly because I think Bitcoin is a stupid
idea and this would help highlight that sentiment.

------
ck2
You could feed the NSA data on about 80% of the world population?

------
alecco
The web's flaws propagate to Google. They can only fix so much. There's a lot
of room to improve on content publishing and sharing.

------
pbhjpbhj
No-one seems to be tying in DNS to all of this. That gives Google a signal
aide from the link-graph and social that tells them which sites are being use
and how often. Its power will increase vastly when IPv6 brings static IPs and
Google then can say that you followed a specific link from your fridge and
made no further DNS hits for 3 minutes, etc..

------
jfb
TRUNCATE <table> CASCADE;

------
cinquemb
Maybe, just maybe we can expect more from future internet companies. facebook
and twitter seem to be in the right direction. but i think actually MAKING
something that is USEFUL to whoever your trying to make money off, not
whatever BS g+ is playing at.

Before i was even into programming i was into making. take heed google sheeps.

------
siculars
The interesting part is not that Google is looking to augment their search
algorithms with human curation, see fb, twitter, bit.ly et al., but rather
that their principal metric, links, is losing value as a principal metric.

------
stfnhrrs
DROP TABLE _

------
opendomain
I would build the biggest NoSQL datastore in the world. I would use all public
data - anything that could be searched and I would invite other companies to
put their data on my cloud. THEN we would find a cure for cancer.

~~~
alecco
P(tongue-in-cheek) == .5

~~~
opendomain
Um - no. I was entirely serious about curing cancer. I even did a speech at
the NY Tech Council on it - <http://www.youtube.com/watch?v=ud1x6VYEc1Q> I
believe we can only solve out big problmes by using big data - please do not
down vote me

~~~
alecco
Repliers can't downvote parent comment. Or at least not within my HN user
level.

Your comment sounds like generated out of HN headlines mixed with wishful
thinking. How could web data "cure cancer"? Are you indirectly calling the
huge amounts of cancer researchers idiots for looking in the wrong place?
Please think before you post.

------
natch
The article mentions Google's developers. I don't think Google's developers
are worth much now to Google, because they have no hunger. A friend at Google
says most of the engineers are just waiting to vest. Look at the poor quality
of the speech recognition coming out of Google Voice for an example of the
result of engineers simply not caring. To me Google is the new Yahoo!

~~~
dhconnelly
Having interned there last summer, my observation is that you're completely
wrong. Who knows, maybe they were all just hiding it for three months.

