
Stanford Stats 385: Theories of Deep Learning - capocannoniere
https://stats385.github.io/
======
inputcoffee
I feel like asking: did they solve the problem?

Let me see if I can state the problem: Neural Networks are non-linear because
of their activation functions. You need a differentiable function in order to
take the derivative so you can back-prop the error, more or less.

The consequence of the non-linearity is that you can't do some kind of short
hand calculation to figure out what the network will do. You have to crank
through the network to see the result. There is no economy of thought. That is
to say, there is no theory.

I am excited that they are working on it, but I would love to have a summary
or overview of how they approach what I consider to be the basic problem.

~~~
bitL
Why is the lack of theory a problem? At some point we have to accept some
problems are "out of our league" and use whatever is available even without
fully understanding it. We can't even understand simple specializations in
computer vision, yet we expect to understand a more general method? It's not
like 15 years ago there weren't math theorems proven by enumerating them on a
computer. I understand it breaks psyche and pride of some scientists, so what?
Universe can't be expected to fit into humanity's collective brain.

~~~
mikebenfield
Because it's much easier to work with and improve something if there's a
theory behind it?

There's a spectrum between "just keep trying tons of crap and see what works"
and "do this simple, well-understood calculation to see exactly what will
work." Is it not obvious why it's nicer to be on the latter end of the
spectrum than the former?

~~~
bitL
Sure, but currently deep learning is more like experimental physics. You try
stuff and see what works and empirically improve your understanding. Then you
can generalize some heuristics from this and use that "recipe" in the future.
You figured out ReLU suddenly made something work, yet Swish turned out better
so you can forget about ReLU now. And as you can treat deep learning (in
supervised mode) as non-linear optimization, I doubt we'll come up with a
proper theory unless P=NP. We can't even understand far simpler non-linear
optimization problems, not to mention ones which can be arbitrarily
parametrized in 2nd order...

~~~
mturmon
But to just restate the comment you're replying to, clearly theory can prune a
_lot_ of branches on the "iterative experiment-based design refinement" method
you are proposing.

Also, I'll mention that you're being too pessimistic about what theory can
accomplish. The learning problem is much more constrained than P = NP.

The use, for example, of a training set of N examples, drawn iid, and
evaluation on samples drawn from the same distribution imposes a lot of
structure. I don't know if you're familiar with VC theory, but it's an example
of the kind of "surprising" guarantees that can be derived in this setting.
Other general examples are weak learning, the bias/variance tradeoff, and (in
SVMs) the notion of large margin classifiers.

An applicable "theory" of design is what separates engineering from just
mucking around.

------
eduren
Does anybody know if they plan on releasing the lecture videos? I couldn't
find them on the site and this looks very interesting.

~~~
walrus
They are being recorded and posted to YouTube as unlisted videos. You can find
links to some of them on Twitter[1] and ResearchGate[2]. It looks like
highlights of the lectures are being posted to Twitter and the full lectures
are being posted to ResearchGate.

[1] [https://twitter.com/stats385](https://twitter.com/stats385)

[2] [https://www.researchgate.net/project/Theories-of-Deep-
Learni...](https://www.researchgate.net/project/Theories-of-Deep-Learning)

~~~
rayuela
Oh those full lectures on researchgate are awsome. Thanks for sharing!

------
AlexCoventry
I'm a bit surprised that Soatto's and Tishby's papers aren't on the reading
list
([https://stats385.github.io/readings](https://stats385.github.io/readings)).
I think they have some of the most interesting theories, at the moment, about
why Deep Learning works.

[https://arxiv.org/abs/1706.01350](https://arxiv.org/abs/1706.01350)

[https://openreview.net/pdf?id=ry_WPG-A-](https://openreview.net/pdf?id=ry_WPG-A-)

~~~
trashtoss
Is anyone looking into/using algebraic topology and sheaves to analyze or
interpret these deep networks?

------
kleiba
What a privilege to study at that college.

~~~
kleiba
From the downvotes, I suppose people thought I was being sarcastic. Quite the
contrary, I was really in awe about what a cool lecture (series) this is! If
you're into that kind of topic, of course, YMMV. But if you are, you will
agree that this is an excellent class -- it underlines (not surprisingly) what
an outstanding place to study Stanford is.

~~~
inputcoffee
I didn't down-vote you but I suspect people reacted negatively because the
implication is you have to stop learning after college. They may feel you can
always learn it at any time. You can take deep learning and math courses
online etc.

~~~
cgmg
> I didn't down-vote you but I suspect people reacted negatively because the
> implication is you have to stop learning after college.

How on earth does their comment imply _that_?

~~~
inputcoffee
I misread it as "to study that at college" instead of "study at that college"!

I now have no idea why it was down-voted.

------
tequila_shot
Can general public attend these sessions?

~~~
jaytaylor
Technically, no, because you have to be enrolled as a student (not just a
student of life ;).

However, Quora seems to think you may be able to sneak in, at least sometimes:

[https://www.quora.com/What-does-Stanford-do-about-non-
Stanfo...](https://www.quora.com/What-does-Stanford-do-about-non-Stanford-
students-sitting-in-on-classes)

------
pwaivers
The pictures on the first page look exactly like what I expect graduate
students and professors of Stats to look like.

