
DeepMind to Work Directly With Google’s Search Team - jamesjyu
http://recode.net/2014/01/27/more-on-deepmind-ai-startup-to-work-directly-with-googles-search-team/
======
nl
It's difficult not to get caught up in the excitement around Deep Learning.
The unsupervised nature of it at its best really does seem to approach human
levels of learning (if not intelligence).

Also, I thought this was hilarious:

 _In a 2011 interview that predated DeepMind, co-founder Shane Legg said he
gave only a 50 percent chance that human level-machine intelligence would
exist by 2028._

ONLY?!? By 2028? That is a pretty radical prediction. Indeed, Legg says there
is a 10% chance of human level intelligence by 2018, and 90% by 2050[1].

Legg has multiple papers published in the field of measuring machine
intelligence, so I guess he has a pretty good view of the field.

If that 2028 prediction is even close to correct then I think setting up an
ethics board now probably is the correct thing to do. ( _the notion that
DeepMind asked that Google create an internal ethics board as a condition of
the acquisition, as reported by The Information, had some AI researchers
griping_ )

Edit: Legg's posting on his blog in 2011 deserves to be read:

 _I’ve decided to once again leave my prediction for when human level AGI will
arrive unchanged. That is, I give it a log-normal distribution with a mean of
2028 and a mode of 2025, under the assumption that nothing crazy happens like
a nuclear war. I’d also like to add to this prediction that I expect to see an
impressive proto-AGI within the next 8 years. By this I mean a system with
basic vision, basic sound processing, basic movement control, and basic
language abilities, with all of these things being essentially learnt rather
than preprogrammed. It will also be able to solve a range of simple problems,
including novel ones._

[1]
[http://lesswrong.com/r/discussion/lw/691/shane_legg_on_risks...](http://lesswrong.com/r/discussion/lw/691/shane_legg_on_risks_from_ai/)

[2]
[http://www.vetta.org/2011/12/goodbye-2011-hello-2012/](http://www.vetta.org/2011/12/goodbye-2011-hello-2012/)

~~~
sp332
Our ethics are far behind our tech and have been for a long time. Maybe we
should make ethics courses mandatory for science degrees, because most tech
people just subscribe to a worldview they pick in high school or college, and
never do any critical thinking about their own projects.

~~~
nl
I think most science degrees have some units relating to ethics. I know my
computer science degree had something called something like "computing in
society" that kind of tackled it.

But the questions regarding - say - what to do with predictions from a super-
intelligent sentient AI weren't really addressed.

I think one problem is that anyone attempting to address those questions in a
serious fashion risks being branded a crackpot.

~~~
endtime
The MIRI folks (intelligence.org) are trying to address exactly that, though
I'm sure some consider them crackpots.

~~~
eli_gottlieb
Their former incarnation as the Singularity Institute _were_ crackpots. Luke
Muehlhauser has done a hell of a lot to make them a more serious organization.

The Future of Humanity Institute at Oxford University also work on this stuff,
with plenty of mainstream credibility.

~~~
endtime
EY is still just as involved as he was before, no? Vassar having left probably
helps with the crackpot perception (I do not personally have an opinion on the
matter, I just know that that's the impression he gives many folks). But given
that you used the plural "crackpots", I'm wondering if perhaps there were
other folks you think MIRI is better off without? No need to name names, just
curious.

(I agree about Luke, he's a great public face.)

~~~
eli_gottlieb
Generally, I would say they crossed the line from Crackpot to Legitimate when
they switched from:

"We are the only people on this planet who have ANY understanding of THE MOST
IMPORTANT PROBLEM EVER and the only way you puny humans can catch up is to
indoctrinate yourselves by archive-diving our blog, and the only way to SAVE
THE WORLD is to send us all your money before the monster can eat you."

to the new message, which is:

"This is a serious problem which could have dire consequences. We believe we
have a plan for dealing with it, but have stopped claiming to be the _only
people on the planet_ who can deal with it. We are going to be publishing
various portions of our plan as peer-reviewed papers and broad discussions
with the informed public of mathematicians, which means we are admitting to
being mere mortals who in fact have peers. We are also collaborating with this
other organization, who are part of one of the world's premier universities.
While we still take donations, we're going to stop demanding you send us _all_
your money."

Overall, I am still waiting for them to get _even less_ cultish before I'm
willing to donate any more than the price of a Harry Potter novel, but I still
find LessWrongians to be one of the most _downright fun_ bunches of nerds to
hang with on the entire internet.

~~~
endtime
Yeah, I used to hang out at the NYC LW meetup a lot, it was a lot of fun.

I agree that the second message is better in pretty much every way, but I'm
not sure they're mutually exclusive in that I think they fall on different
parts of the "what we say publicly <\--> what we believe" spectrum. Though I'd
agree that they seem to have moderated a bit.

------
sjg007
What I am really curious about is the difference between Deep Learning using
RBMs, PCA, Factor Analysis, Bayesian graphical models. All of them seem to
treat a numerical problem in different ways. I'm particularly interested in
the identifiability of the equations (and introduce some constraints to do
so).. or else you have multiple (infinite) solutions. As far as I can tell,
the deep learning is basically a multi layer RBM with greedy optimization at
each level. Maybe that is good enough for text summarization or robotics, but
in a mathematical sense depending on your starting vectors you can arrive at
different local minima. In physical dynamical systems we are typically
interested in the global solution from which we do further analyses to prove
stability etc... Bayesian networks have loopy belief propagation, and of
course PCA/SVD require strict linear independence.

~~~
dave_sullivan
There's a bit of confusion around this, but "deep learning" is nearly
interchangeable with "neural networks". So PCA, for instance, isn't deep
learning (which definitely doesn't make it bad, it's just definitionally not
deep learning ie not a neural network).

Now, as far as the difference between PCA and eg an autoencoder (which is
similar to an RBM, but different--both RBMs and autoencoders are "deep neural
networks")--well, PCA just learns a linear mapping. You can stack PCAs on top
of each other I guess, but it is functionally equivalent to a single layer. So
it makes certain assumptions about the behavior of the data--specifically,
that it's linear.

A neural network layer calculation looks like this: nonlin(Wx + b) A deep
neural network forward pass looks something like this:

W3 .* nonlin(W2 .* nonlin(W1 .* INPUT+b1)+ b2) + b3)

Nonlin can mean a variety of things. Usually tanh, sigmoid, or max(0,x). If
you remove those nonlinearities, it basically becomes like PCA, like a model
that learns a linear mapping. When you have those nonlinearities, you get a
non-linear mapping which, theoretically, lets you address a wider variety of
problems.

But yeah, if you want to learn about deep learning, just learn about neural
networks. It's the same thing.

Also, re: your point on local minima--on the upper end of experiments that
have been done, you've got over a billion parameters. 10 million parameters
would be on the relatively small side. I think it's reasonable for there to be
a variety of possible solutions to any given problem, and the whole idea is
that we're using models that are very flexible due to their high capacity. The
problem historically has been using the capacity effectively. However,
regularization methods like dropout help a lot, and I've found it's easier to
think of neural networks as "feature learners". Each one of those neurons is
theoretically a feature its learning. And even still, if you look at the
features that are learned by convolutional nets (which are easier to visualize
due to their being used on vision problems), they tend to arrive at similar
features even though it starts from fairly different initializations.

~~~
tinkerdol
> if you want to learn about deep learning, just learn about neural networks.
> It's the same thing.

Hi, could you explain this further? From my limited understanding, couldn't
deep learning be achieved without neural networks? If we define deep learning
as something like automatic feature extraction on one level of hierarchy in
order to aid classification at a different level. I understand that you are
saying that neural networks are the most common tool used, however what is the
reason for this -- couldn't one also stack the results of other algorithms
(decision trees, SVM, etc.) into a hierarchy of features?

~~~
dave_sullivan
Sure. In 100% of articles you've seen in the last few years re: "Deep
learning", it's about neural networks. It's not that neural nets are magic,
it's just that stacking them has been called "deep learning" and that's the
name that seems to be sticking. I'll even be cynical and say neural nets got
such a bad rep by the 90s that they needed a new name.

Re: stacking SVMs and decision trees for feature hierarchies: I guess you
could (maybe), but you'd get worse results than if you used neural nets. And
definitionally it wouldn't be deep learning.

------
zvanness
Here's what I said when the story first came out:

"I'm not so sure their technology is as futuristic as everyone thinks it is.
If I had to take an educated guess, I would say it's some powerful AI that
makes their knowledge graph smarter. Currently Google's Knowledge Graph uses
more structured data sets and depends on a mechanism like this:
[http://www.zachvanness.com/nanobird_relevancy_engine.pdf](http://www.zachvanness.com/nanobird_relevancy_engine.pdf)
But the real challenge is to make the knowledge graph update in real time and
take meaning from something as unstructured as a blog post or an email. And to
do something like that requires some really unique AI."

This is most likely what they are going to be implementing:
[http://www.zachvanness.com/nanobird_relevancy_engine_vision....](http://www.zachvanness.com/nanobird_relevancy_engine_vision.pdf)

~~~
hnriot
why repeat an empty comment? It's clear that you don't really have any
knowledge of, or insight into state of the art machine learning, so I don't
see the need to paste in what you've already said. It's very hard for those
not familiar with the field to see when something is novel, much of ML looks
"obvious" to a typical developer.

As for Google's KG, updating the model in real time isn't really necessary,
knowledge doesn't typically advance emails or blog posts at a time. They
already have that in their search, knowledge is something that is mutually
agreed upon and takes some amount of consensus. Using the structured sources
(like freebase) is much better than trying to do dependency parsing on some
blog post and try to make sense of it. Powerset tried that and it didn't work
then, and won't work for at least a few years. ML advances not in leaps and
bounds but rather by little pieces of the puzzle falling into place. The
biggest thing to happen to the ML world isn't algorithms or fast computers,
but the abundance of data available. Like pagerank, the more data it has, the
better it works, the same is true of ML systems. Google has more data than
god, and it is through this they will push the frontiers of AI - along with
IBM and the others working in this field.

~~~
nl
_Using the structured sources (like freebase) is much better than trying to do
dependency parsing on some blog post and try to make sense of it. Powerset
tried that and it didn 't work then, and won't work for at least a few years._

Evidence extraction from unstructured data is a pretty active area of
research. Clearly structured data is "better", but the issue with many
structured sources is they don't accurately represent knowledge.

For example, answering "What is the capital of Israel" and/or "what is the
capital of Palestine" really depends on both who is asking and who is
answering. Unstructured data is generally good at showing controversy like
that.

~~~
eli_gottlieb
>For example, answering "What is the capital of Israel" and/or "what is the
capital of Palestine" really depends on both who is asking and who is
answering.

Wow, did _you_ just ever open a can of worms.

Legally speaking, the capital of Israel is Jerusalem, and the capital of
Palestine is currently Ramallah. These are also the _practical_ capitals, in
the sense of these cities containing their respective national governments and
institutions.

Things only get thorny when you start trying to ask about what capitals are
_recognized by other countries_. Then you start getting nonsense answers, like
the capital of Israel being in Tel-Aviv (then why isn't the Knesset there?)
and the capital of Palestine being nonexistent due to occupation (then where
does the Palestinian Authority govern from?).

For legal and practical purposes, there are answers to these questions. People
are just embarrassed to contradict their own politics by speaking these
answers aloud. So if we're talking about knowledge extraction, if I were
Google, I would give the legal and practical answers. Telling someone the
capital of Israel is in Tel-Aviv may be politically correct in some places,
but it won't get them to the state office they need to visit.

Anyway, consider me to be wearing my flame-proof suit.

~~~
nl
This is exactly my point (to be pedantic Palestine actually claims Jerusalem
as its capital too).

~~~
eli_gottlieb
Palestine can claim all it likes: their government offices are not physically
located in Jerusalem, and the location of those offices is what a Search user
needs to know.

~~~
nl
Interestingly, Google says the capital of Palestine is Jerusalem.

I'm not at all sure the location of government offices is a good metric to
base that answer on. The Netherlands and South Africa are good counter-
examples (each separate the legislative and official capitals), and I suspect
there are plenty more[1]

(I didn't downvote you, BTW)

[1]
[http://en.wikipedia.org/wiki/List_of_countries_with_multiple...](http://en.wikipedia.org/wiki/List_of_countries_with_multiple_capitals)

------
gulbrandr
Google will soon be able to predict your searches before you. It would require
a mix of this kind of technology and all the data it already has, and will
have, on its users.

This acquisition is just another step in this direction.

