
Larry Page on Real Time Google: We Have To Do It - Anon84
http://www.readwriteweb.com/archives/larry_page_on_real_time_google_we_have_to_do_it.php
======
kirse
When I watched the LOST season finale last Wednesday I was trying to find out
the answer to the question "What lies in the shadow of the statue" right after
the show ended. (During the show, the answer was spoken in a different
language)

Google real-time search had picked up the answer from both a TV forum and
Yahoo! Answers in about 5 minutes and it took Twitter about 55 minutes before
anyone had an answer in the search results. I felt a bit let down by my first
usage of "real time" search from Twitter.

~~~
omarchowdhury
All hail John Locke.

------
Alex3917
Please don't do this. Google is already useless enough as is. New HN comments
get indexed in like five minutes, but then they're gone in a month or two.
Same with all content lately, it gets indexed really fast and then it's gone.
It's gotten to the point where I'm looking for stuff I know exists and I know
the keywords used on the page, and I still can't find it because Google just
doesn't index that page anymore even though it's relatively popular. (For what
it's worth, I was looking for a lecture that IIRC was submitted here a while
ago that was basically a defense of pure math research in university.)

I actually used Yahoo! this week for the first time in a decade because Google
just doesn't return good results anymore.

~~~
buggy_code
Can you explain this more? Why would HN comments vanish? Doesn't google follow
all links?

~~~
aristus
HN was not designed with SEO and linkage in mind -- after a while pages can
become "orphaned" after they get pushed down by new stuff. No search engine
follows all links. As a practical matter they generally give up after 6-8
links away from the home page.

~~~
ntoshev
Google search for this site still returns years old results:

[http://www.google.bg/search?q=site:http://news.ycombinator.c...](http://www.google.bg/search?q=site:http://news.ycombinator.com/+800+days+ago)

(for better results try "800 days ago" with quotes, HN strips them for some
reason)

------
ajju
I have a feeling Larry is being sneaky here and trying to misdirect
competitors. I don't doubt that indexing content in real time and making it
searchable has some utility - see the comment on this article about some
question on the t.v. show Lost - but maybe 1/100 of my searches are like that.
If my search for the missing link IDA is going to be polluted by hundreds of
results of people tweeting about its discovery, I'd rather not have real time
search.

On the other hand, this is not an either or proposition. I am ok with this as
long as Google keeps the Tweet search results separate but equal (somewhat
like they keep the blog search results separate via blog search but equal in
that blog posts with good pageranks do appear in search results. Although I
can imagine few if any individual tweets having a very high page ranks).

~~~
avichal
2 thoughts: 1) I worked on the search engine and Larry has been saying this
for years. I worked on search quality back in 2005 and even back then he was
talking about indexing everything and indexing it in seconds instead of hours.

2) Search is won on the margins. Yahoo and Google do equally well on most
queries, but users decide which engine is better based on how it performs over
all of the types of searches they have to do. So when you take the 80/100
searches you do that are not real time and use unique keywords and you know
what you're looking for Y! and G come out the same. It's on those other 20/100
that Google wins users.

~~~
ajju
Update: Hey, you're the guy who launched Google Transit! I could use your
advice ( See <http://www.ridecell.com/gt/about/> ). Can I email you?

I think indexing everything in seconds could definitely be a competitive
advantage. I haven't tried Yahoo in years and back then it did much worse than
Google. If it has improved this much, it makes sense that the competition is
at the margins.

On the other hand, TechCrunch/Twitter et al's idea of real time search seems
to be limited to indexing Twitter and Facebook updates as they happen. The
arguments they present amount to "Someone tweeted about a plane crash from the
crashed plane". I don't think indexing such tweets is going to be Google's
edge in search. OTOH I can think of using such information to generate Google
Alerts being very useful for some people.

------
greyman
Yes, in recent times I changed my searching habit a bit: I still mostly use
Google, but when I want to find out what are the newest things about some
topic, more often that not, Twitter Search will give better results than
Google. So I think Larry is correct.

Still, Twitter doesn't do "realtime search", it does "realtime twitter
search", so what Googlers ough to do will be more complex.

------
hko
<http://www.scoopler.com/>

------
sachinag
Larry: just buy Scoopler and be done with it.

------
thisduck
How does Google figure out relevance in realtime? With twitter, it's user
driver content with tags etc.

But with the net at large, blogs etc, this becomes difficult. Incoming links
etc are hard to determine in real time (primarily because they haven't
occurred yet).

~~~
aristus
The site's PR, uptime/age, update rate, uniqueness of content. By this measure
HN is near the ideal: it has amazing inbound PR but does not link out much.
It's been running fast and fine for ~3 years, the content is often unique, in
the sense that it contains phrases Googlebot has never encountered before.

An experiment: here is a search that matches an exact phrase in this comment.
[http://www.google.com/search?q=%22By+this+measure+HN+is+near...](http://www.google.com/search?q=%22By+this+measure+HN+is+near+the+ideal%22)

At the moment it returns nothing. within a minute or two this comment till be
the first result.

~~~
foulmouthboy
52 minutes and counting. ;)

~~~
aristus
Yep. Fail. Stuff from yesterday is indexed however:

[http://www.google.com/search?q=confusingly+called+copy-
regio...](http://www.google.com/search?q=confusingly+called+copy-region-as-
kill)

------
josefresco
Imagine if this had happened 5-8 years ago and instead of using
Twitter/FB/MySpace for the 'activity stream' it used IM statuses from all the
major IM networks (ICQ, AOL, MSN, jabber etc.)

Bizaro world for sure but interesting to ponder.

~~~
joepestro
I wrote exactly this 4 years ago :)

It was called AwayGrabber (www.awaygrabber.com).

I wrote a overly complex crawler in C to grab away messages from IM networks
as fast as rate limiting would allow. Then created a web frontend for viewing
all of the status messages from your friends.

It was cool since in most clients at the time you needed to click on a friend
and select "get info" for each status you wanted to read. Feeds for status
make much more sense. However, I got tired of trying to reverse engineer the
changes in various closed protocols (oscar, etc). So I did more than ponder
this when I was in college, I tried it.

------
mackeeeavelli
I really thought Microsoft was ahead on this when I noticed their Live.com
spider hitting my site 3:1 for Google. That was 1 year ago. What took Google
so long to realize real-time is where its at?

