
Who Wants To Beat Google? - nreece
http://www.realsoftwaredevelopment.com/who-wants-to-beat-google/
======
Eliezer
If you try your hand at machine learning, you learn very quickly that 95% of
your amazing bright ideas don't work. The thing you learn by reading about
popular algorithms isn't how ingenious they are - their popularity tells you
the key fact that they often _actually work_.

Google is an AI company, they just don't advertise the fact. It's quite
possible that someone in Google already tried an approach like this, and found
that it didn't really work. A lot of people had similar ideas about the
Netflix Prize and it didn't work for them.

I don't mean to sound too discouraging here. It's just that brilliant new
notions for data you can integrate into your AI algorithm (and search is an AI
algorithm) more often than not turn out to simply... not... work. So you don't
get enthusiastic about an idea like that because it sounds good - you really
do have to try it. If it turns out to work _great_ in practice, like PageRank,
then you might have a Google killer on your hands. It's just that the
practical performance is hard to guess by eyeballing what a good idea it
sounds like.

~~~
ced
_If you try your hand at machine learning, you learn very quickly that 95% of
your amazing bright ideas don't work._

I wonder why we can't tell the good ideas from the bad ones. There has got to
be some core principles that are not properly known. After all,
backpropagation neural networks are exactly an idea of the "bright and
ingenious" kind, but it actually works.

I was very disappointed that the Stanford Machine Learning lectures were
almost exclusively about "how" and not about "why".

~~~
pchristensen
_I was very disappointed that the Stanford Machine Learning lectures were
almost exclusively about "how" and not about "why"._

Maybe they don't know why? You'd have to know the core principles to know why.

~~~
ced
But then... How did they come up with all these learning algorithms? Do _all_
researchers have that 95% failure rate?

------
jwilliams
I disagree with a lot of this article - firstly, I wouldn't call Microsoft
_extremely diversified_. Sure - they have a lot of products - but the
cornerstones are Windows and Office.

Microsoft is fixated on their platform - Windows, Office. Almost every other
play Microsoft has is really a funnel for revenue into these (and their
closely associates). Microsoft's play is all about "seamless, easy
integration". That is, you get SQL Server, it runs on Windows, it integrates
nicely into VS.NET - which is integrated into your IE browser.

To innovate, Microsoft needs to move beyond this. They may have the best
engineers in the world, but it's hard to break away if the mentality around
these products persist.... Everything needs to work with Outlook, or Live
Messenger, or search, or whatever. Microsoft is all about pre-canned and all
the advantages and disadvantages that brings.

On the other hand, these are hugely profitable products for Microsoft. It's
hard (and perhaps reckless) to let go of this as a strategy.

In my view, Microsoft needs to break itself up a little in order to create
niches and to innovate. This means - don't integrate everything - open things
up - attack markets that aren't already Microsoft mainstays.

The microformats and social network integration as an idea is fine - very
semantic web - but Microsoft simply couldn't execute it at the moment. It
would get tangled up in this machinery.

------
wheels
I'm really interested in how the personalization stuff and link graph are
merged at Google. Part of what we're doing at Directed Edge is merging graph-
based (PageRank and HITS-like algorithms) and matrix-based (collaborative
filtering) into a single model that will allow us to do information filtering
for recommendations. Once you've got both of them loaded up onto the same
model you can do some interesting things clustering wise. One of the biggest
problem with typical matrix clustering, i.e. what Google News uses, is that
you're limited in trying to move out to neighboring clusters. If you combine
both of them into a single graph model you've got more predictive power in
terms of what you expect for a user to know about and what you expect to be
new information for them.

------
dzorz
> If my mom does a search for something, or if I do a search for something,
> Google treats us exactly the same. My mom loves crafts, and I love software
> development. We should get completely different results.

This is actually not true. Different users _do_ get different results, not
only based on whether they are searching using their local google site or e.g.
google.com, but also based on their search history.

For example, when I searched for GAC (I'm signed in with my google account),
the first result I got was:

<http://en.wikipedia.org/wiki/Global_Assembly_Cache>

~~~
presty
that only happens if:

a) your logged in your google account

b) you enable web history (which is disabled by default and not very
advertised)

meaning that for most people, they do get the same results (apart from the
local site - which i've come to see that is very targeted by spammers,
rendering it useless)

~~~
josefresco
Good point, which highlights the problem with this approach. No one wants to
give Google/MS/Facebook all their personal private data just so their searches
can be customized.

------
josefresco
Imagine the frustration if you are a software developer and are actually
searching for Geological Association of Canada and Google/MS only returns for
Global_Assembly_Cache because of your 'profile'.

Fail.

------
thaumaturgy
The author might want to take a look at some of the existing clustering search
engines (e.g. Clusty, <http://clusty.com/>). If that could somehow be bound to
a user's social network, then it might, maybe, possibly could work.

------
schtog
What was his actual idea?

Integrate more personal information into search?

That is already being done to some degree, not being done to a bigger I would
think has to with complexity of algorithms and lack of reliable/valuable data?

------
pmorici
Doesn't Google learn peoples interests with the personal search?

------
hhm
The approach explained in the article is similar to the one used by Popego,
the "interests graph".

------
anthonyrubin
If you use Google Bookmarks it supposedly does take such things into account
when searching.

------
known
PageRank = Wisdom of Crowds

~~~
josefresco
PageRank = Wisdon of SEO Experts.

It's easily manipulated and will someday die when a better mousetrap is
developed.

~~~
known
PageRank = Direct Democracy

------
apsurd
AHHH GAWD i had to stop reading that to maintain my self respect. This guy
does not know the meaning of "logic" and unbiased trials for comparison.

In a nutshell: He actually entered 'dance club' in google and was ANNOYED by
the lack of RELEVANCE AND DETAIL. No no, rather than take .45 seconds to
actually THINK about refining his search query to "popular dance clubs in
miami" he proposes that THE SEARCH ENGINE should sift though all of his
billions of "social utilities" to figure out he was planning a trip to miami
and so OBVIOUSLY he was looking for "dance clubs in miami. But wait, there's
more. His Google-killing idea further specifies that the results should list
clubs that 'his friends' have recommended and like. So forget about putting
"popular" in the search term, and forget that any NORMAL PERSON would just use
yelp, no no no no

the answer is to blame Google for its blatant disregard for peoples search
stupidity and FIX EVERYTHING about your query, because you know...its brain is
bigger than yours. duhhhhhhhhhhhhhhhhhhh

Good god that was bad. Just for kicks, uh yeah, its called the semantic web,
people are working on it, and uh no, you did not invent it.

WOW

