
Interview with Yann Lecun on what AI can do - sconxu
https://www.newscientist.com/article/dn28456-im-going-to-make-facebooks-ai-predict-what-happen-in-videos/
======
nicklo
Yann gave a talk at my university last week. It's really astounding how far
deep learning has come from performing classification and regression to now
doing things like reasoning and generation. Does anyone else find this wild?
These are tasks that historically required a tremendous amount of domain-
specific engineering to perform well. And recurrent memory networks just power
right on through them.

I understand that learned, dense vector representations are powerful blocks to
operate on, but hot damn the trend of general systems that require less
domain-specific engineering is an exciting one that blows my mind.

~~~
tachyonbeam
I definitely find it wild. Recent developments make me very optimistic that I
might see useful domestic robots within my lifetime, maybe even something like
Data.

It definitely seems way less far-fetched now that we have machine learning
techniques for semantic segmentation, object and face recognition, reasoning,
voice recognition and synthesis and even translation, extracting useful
meaning from text/speech.

------
hyperion2010
You know, I see many different big names in the ML field shooting for a
personal 'assistant' or something like that. For whatever reason I have always
thought what would be preferable would be an advocate. A being that would
always have your back and be watching out for your best self and for you in
general, preventing you from getting screwed by someone trying to make an
information asymmetry play or trying to access your data. Somehow I worry that
most big companies aren't going to be trying to create something that puts the
user first. I'd love to be wrong.

~~~
__Joker
I too hope that there would some Linux equivalent in the personal assistant
space. It simply runs on my machine/mobile with my data, and I am totally in
control of it. Currently this seems little intractable for availability of
data and personal computing power but I hope these will be a lesser concern
another decade in future.

~~~
kybernetikos
Ideally there'd be a standardised api for personal assistants, so you can move
between different services or swap the whole thing out for an open source
alternative.

------
graycat
He was right: For the progress he is seeking now, neural networks stand to
play at most a peripheral role. Instead, he needs some new and quite different
ideas.

That the networks are _neural_ might mean that they could simulate a worm;
that he got more out of them, as, say, non-linear fitting functions, is cute
but not much progress for the next challenges he has in mind.

------
misiti3780
FYI this was submitted a few days ago and no one seemed to care:

[https://news.ycombinator.com/item?id=10521398](https://news.ycombinator.com/item?id=10521398)

~~~
prawn
Sometimes the front page is too busy or something doesn't take. Don't worry
about it - they're just internet points. No point linking to it unless there
are comments that might also be interesting to readers of these ones, right?

~~~
misiti3780
i could careless - i was under the impression HN had URL dup algorithms to
make sure you cant double submitted within a few months of the OP

~~~
adenadel
I think dang said recently it was something like 8 hours. Beyond that you can
resubmit. I've had cases where I've submitted something and it just takes me
to the thread where someone else posted it and it automatically gives them an
upvote from me.

------
graycat
"Unsupervised learning"?

Well, setting aside the jargon, one approach is just a statistical hypothesis
test, and a lot is known about that.

~~~
nl
Unsupervised feature extraction (eg in visual tasks with backprog) goes well
beyond what would conventionally be called "a statistical hypothesis test".

For example, look at the image of the features learnt by the AlexNet CNN[1].
In a more conventional "statistical hypothesis test" these would be manually
defined by hand. IMHO a system that automatically extracts those features
itself deserves to be called more than "just a statistical hypothesis test"

[1] "Example filters learned by Krizhevsky et al"
[http://cs231n.github.io/convolutional-
networks/](http://cs231n.github.io/convolutional-networks/)

~~~
graycat
Clearly my remark was

> ... one approach is just a statistical hypothesis test ...

So we have no disagreement.

E.g., is this JPG an image of a human face? So, for some variables, extract
data on those variables from known ( _supervised_?) human faces and do an
hypothesis test with null hypothesis that the JPG is of a human face.

Or, are these human faces like those human faces ( _unsupervised_ )? Then get
data on the variables and do a _two sample_ test with null hypothesis that the
two collections of faces are the same, i.e., from the same distribution.

~~~
nl
Well yeah.

But that is like saying that IBM Watson is just a statistical hypothesis test:
"Is this answer or this answer correct?"

That is absolutely true, and completely useless.

The hard part isn't telling the difference once you've done the feature
extraction, it's doing the feature extraction automatically itself.

(Actually that undersells it too, because you have so many combinations of
features to evaluate that making that part work is non-trivial too. But at
least that part is just calculus)

~~~
graycat
We are being too brief.

Here is a relatively non-standard approach to hypothesis testing:

[https://news.ycombinator.com/item?id=10484602](https://news.ycombinator.com/item?id=10484602)

So, that was looking for _bias_. The _training_ was to say what people were
the women.

So, could use that to say if these people were like those people. So, could
get some _people recognition_.

Or, go to a valley with a lot of rocks and for each of 1000 rocks measure 10
quantities. Then here is one more rock; did it come from that valley? So,
doing _valley rock recognition_.

Here are 1000 faces from North Korea and 500 from South Korea. Measure 10
quantities on each and ask if the faces are different (this would be a multi-
dimensional _two sample_ test, distribution-free).

Here's a more general picture: Take some data. Ask if the data has property X.
Make a _null hypothesis_ that the data does have property X, and with that
assumption take a function of the data and calculate some probability that
would be high if the data has property X. Then see what the probability is for
the data do have. It that probability is low, then reject the null hypothesis
and conclude that the data does not have property X. So, have now _recognized_
that the data is not X. It may from some further knowledge that conclude that
the data has property Y. This might help a self driving car decide if the
stuff in the road is just some harmless paper and can drive over it or if it
is dangerous and must avoid it, if the ice about to step on is solid, if that
knob is to be pulled on or twisted, ..., a lot of questions in _AI_.

I'm being brief. Questions?

~~~
cgearhart
> Take some data. Ask if the data has property X.

Unsupervised learning would ask, "does this bag of data have any properties?"
or "what are the most significant properties in this bag of data?" or "are
there any partitions of this bag of data that seem to be more similar than
others?"

If you know what you're looking for in advance then there are more direct ways
to search for it (like hypothesis testing). Unsupervised learning is what you
get when you relax the requirement to know in advance what it is you want to
test, and relax the requirement that you can tell once you find it.

~~~
graycat
Ah, we're getting a C in communications.

> "does this bag of data have any properties?"

WOW! That's one heck of a hard question! There can be a LOT of properties! No
properties might mean _random_. So, one case is old random number testing!
People settled on the Fourier test -- essentially hypothesis testing.

You seem to be saying that hypothesis testing requires knowing something too
specific about what is being looked for. Well, in some of the elementary
stuff.

But there's the old _distribution free_ _two sample_ tests: Joe teaches
calculus to 30 students, and Mary teaches the same subject to 25 students. Is
their different performances significant?

Specific? We are not saying much about what is different, but we do get
estimates of Type I error, that is, saying that there is a big difference when
there is not. So, how? Put all 30 + 25 student scores (right, something
specific, unidimensional, but see below) in a pot, stir briskly, pull out 30
and 25, that the difference in their averages, do this 1000 times, get an
empirical distribution of the difference under the hypothesis that there is no
difference, then see where the actual difference is in the empirical
distribution and estimate false alarm rate.

For measuring several quantities not just one, can look for a significant
difference even if have no idea what the heck the cause is. So, are not being
specific about a cause.

Generally in making _AI_ _decisions_ , what to have some sense of being
correct. A first cut approach is hypothesis testing.

A second cut would be, say, looking for a diagnosis. So, have some data and 20
candidate diagnoses. So, do hypothesis tests on the appropriate data for each
of the 20, maybe different parts of the data for each of the 20, and pick the
diagnosis based on the Type I estimates.

Hypothesis testing has a cute and powerful idea: Make an hypothesis of no
effect and use that to calculate some probabilities. There's got to be a good
role for that in _AI_.

~~~
cgearhart
> That's one heck of a hard question!

Exactly. That's the problem. It's not just a question of deciding whether the
data is random or has some particular property, but trying to answer the
question of whether there is _any_ signal buried in the data -- and then
deciding whether that signal is meaningful.

For the most part, unsupervised learning is more like data pre-processing than
an alternative to hypothesis testing. General frequentist statistical
techniques (including hypothesis testing, goodness of fit, etc.) are an
integral part of evaluating the results of unsupervised learning.

~~~
graycat
From a paper I published, go to a server farm and for three months collect
data on each of 10 variables 100 times a second.

Now, in real time, given values on each of those 10 variables, ask if the farm
is sick or healthy? Get known, adjustable false alarm rate and a relatively
high detection rate.

So, it's a multi-dimensional, distribution-free hypothesis test.

Don't know what patterns or properties are looking for and don't get a
diagnosis, but still, if get a detection at a low false alarm rate, then have
good evidence the farm is _sick_.

So, this hypothesis test is not the same as you have described.

So, is this _unsupervised learning_?

