
Yann LeCun vs. Christopher Manning on Need for Priors in Deep Learning - andreyk
http://www.abigailsee.com/2018/02/21/deep-learning-structure-and-innate-priors.html
======
YeGoblynQueenne
>> Over the last few decades, innate priors have gone out of fashion, and
today Deep Learning research prizes closely-supervised end-to-end learning
(supported by big-data and big-compute) as the dominant paradigm.

I don't get why deep learning researchers are so hung up on learning
_everything_ from scratch. The tendency for ever more compute and data is just
unsustainable. For problems like natural language, where distinct evens may be
infinite, you can keep throwing data at the problem and you'll _never_ make a
dent to it. There are problems that grow at a pace that cannot be matched by
_any_ computer, no matter how powerful.

What's more, as a civilisation we have nothing if not knowledge about the
world. We have been accumulating it for thousands of years. It's what makes
the difference between an intelligent human and an _educated_ intelligent
human. And it's a big difference. So, if we have all this background
knowledge, why not use it, and make our lives easier?

~~~
make3
[Deleted]

~~~
Scea91
> it's gone out of fashion because it's been beaten extremely convincingly by
> methods that learned everything from scratch, in every single task

Totally false. By every single task you probably mean "every deep learning
success story" which would be tautological.

There are tons of tasks where deep learning doesn't quite work yet and you
still have to hand-craft your features.

Example from my domain: Try learning a classifier for malware which uses just
raw binaries on input.

~~~
nl
My understanding is that work on deep learning for malware classification is
pretty early. Having said that, my understanding is that they are getting
closer.

[https://pdfs.semanticscholar.org/f1c8/7533e628fd374b9e98f1e1...](https://pdfs.semanticscholar.org/f1c8/7533e628fd374b9e98f1e1ccc18a0e8f195d.pdf)

[https://devblogs.nvidia.com/malware-detection-neural-
network...](https://devblogs.nvidia.com/malware-detection-neural-networks/)

~~~
Scea91
Results in this field usually look way better than they would be in a
production environment.

Regarding articles you mentioned:

Using ROC curve for evaluation in this case is a red-flag because it doesn't
take the data imbalance into account. Precision-Recall curve is way more
suitable. You can have great AUC on the ROC curve but precision can be near
zero in highly imbalanced problems such as malware detection. Precision is the
probability that a positive detection is a true positive. Which is usually the
measure you are most interested in.

The problem is that precision changes if you change the class priors. Because
of that, the results are always very dataset specific.

With that said, I do not say that machine learning or neural networks do not
work on this task. It just doesn't work in the end-to-end manner where you
just feed raw binaries as input to some generic architecture as we can do with
images in some tasks.

------
kevinwang
>As an example (28:57), he described how the human brain does not have any
innate convolutional structure – but it doesn’t need to, because as an
effective unsupervised learner, the brain can learn the same low-level image
features (e.g. oriented edge detectors) as a ConvNet, even without the
convolutional weight-sharing constraint.

I think the first part of the sentence should be that the brain doesn't have
an innate weight sharing (like is stated in the end of the sentence), not that
it is not convolutional. I believe the convolutional structure is actually
copied from the visual cortex (but with no weight sharing as far as we know)

~~~
yorwba
Convolutions are mathematically defined by application of the same kernel at
each possible position. If that kernel has finite support, you also get
locality. The visual cortex has locality, but without weight-sharing between
functionally identical neurons, it's not convolutional.

~~~
empiricus
The visual cortex neurons have not only locality, but also they
learn/recognize similar features. This was the original inspiration for the
conv nets: it was a way to learn efficient local features and apply them to
the whole image. To me it seems that cortex neurons and the initial layers of
conv nets do similar work, but with constraints of their implementations:
biological neurons cannot share weights, and for artificial neurons it is more
efficient to learn and compute dense convolution.

~~~
posterboy
Disclaimer: I'm learning deep learning mainly from HN comments and just want
to provoke more insights. I have no idea what weight sharing is or how kernels
are represented in networks, but I do know that e.g. a blur filter or edge
filter is represented as convolution matrix.

It is dangerously confusing to reapply the neural-net-terminology to neuronal-
nets isn't it? The weight of a kernel of biological neurons, what is that
supposed to mean?

If you haven't stopped reading yet, please consider: in case, as I have to
assume, you mean there is a specific ensemble of neurons that represents a
kernel of given weights corresponding to exactly one area of retina, then
isn't sharing between "pixels" achieved simply by the eye's jittering?

For better or worse, assume I'm the adversary in a GAN and ignore me if it
doesn't make sense.

------
micro_cam
Just started watching the video so they may mention this but one thing I find
fascinating is that some recent work suggests that the optimization algorithm
(usually stochastic gradient descent) or complexity of the loss surface (ie
having lots of local minima that are almost as good as the global maxima) may
actually be seen to induce a kind of regularization prior.

Ie these seemingly really complex models are actually biased to find simpler
solutions that generalize well in a way that turns out to often work better
then trying to explicitly learn a simple model.

------
nextos
Can anyone point to any reference where models depart from "closely-supervised
end-to-end learning"?

In the article, LeCun & Manning argue this paradigm has some limitations, and
I do agree. I'm thinking the field will evolve to systems becoming a
combination of probabilistic logic-based engines (which represent formal
causal reasoning) plus lots of deep models (which represent intuition,
hypothesis generation and specialized tasks like vision).

~~~
make3
"Unsupervised Neural Machine Translation",
[https://arxiv.org/abs/1710.11041](https://arxiv.org/abs/1710.11041)

