

Unsupervised Feature Learning and Deep Learning Tutorial - akaul
http://ufldl.stanford.edu/wiki/index.php/UFLDL_Tutorial

======
montecarl
I have a question that I don't really know where to ask that relates to
unsupervised feature learning (I think).

Lets say I have many-body problem, such as the energy of some number of atoms
given their positions. I can calculate this by solving some equation, but its
expensive. If I wanted to apply some machine learning to the problem some form
of regression would work if I stuck to a fixed size (some number of atoms in a
particular coordinate system).

More concretely, if I had energy as a function of two atoms, I could fit some
function through the points and get something halfway reasonable.

However, if I want to have three atoms, I have no idea how to treat it. The
learning algorithm has no idea about "atoms" or that their energies might be
composable in some way.

What sort of machine learning could figure out and learn the underlying rules
of physics and "understand" how energy should change as a function of the
positions of the atoms and the underlying physics rules that must be
preserved. One of these rules is that the energy must go to a constant
(possibly zero depending on the energy is defined) if you separate the atoms
infinitely far apart and must go to infinity if the distance is zero. Also the
function must be smooth with continuous derivatives.

I haven't made my point very well, but perhaps someone would know what
literature to read about unsupervised learning algorithms that could "learn"
physics.

~~~
tfgg
People have tried applying machine learning to learning many-body potentials
in physics for use in speeding up quantum molecular dynamics while maintaining
most of the accuracy. What you'd do is say that the total energy of the system
is a sum of local potentials on each atom R, where the input is the local
environment of atom R:

E = \sum_R e(env(R))

You use some method to create some features env(R) that makes the
translational and rotational invariance of e() easy, with a radial cutoff
beyond some distance, and then model e somehow. I think the most promising
method is Gaussian Approximation Potentials, which use Gaussian processes to
model e and (what they call) a bispectral decomposition to represent the local
environment around the atom.

<http://prl.aps.org/abstract/PRL/v104/i13/e136403> (free arxiv version:
<http://arxiv.org/abs/0910.1019>)

Without the above simplifications like modelling it as a sum of local
potentials, and making env(R) a cutoff, you would indeed just be fitting a 3N
dimensional function in the case of N atoms. It'd be exact, but it'd also blow
up pretty badly and utterly nontransferable. Also, the energy surface isn't
necessarily continuous and differentiable -- consider the energy when two
atoms move to occupy the same position.

I suspect that the bispectral decomposition to give env(R) could be improved
by using unsupervised feature learning to learn better features such as "we're
5 angstrom away from a surface". I've seen talks where people have hand-
optimized feature sets to include things like "there's an aromatic ring
pointing at us from 5 angstrom away" that a simple function + cutoff might
miss.

~~~
montecarl
I think I might have read a different paper on GAP before, this one seems to
have some more detail of their philosophy. Thank you very much for the link.

You say that: "Without the above simplifications like modelling it as a sum of
local potentials, and making env(R) a cutoff, you would indeed just be fitting
a 3N dimensional function in the case of N atoms. It'd be exact, but it'd also
blow up pretty badly and utterly nontransferable. Also, the energy surface
isn't necessarily continuous and differentiable -- consider the energy when
two atoms move to occupy the same position."

Those contraints and representations of the problem are the thing I would want
the machine learning algorithm to discover. Is this beyond the scope of the
state of the art algorithms in machine learning? I understand that making a
good choice for the representation of the problem should make the job of the
learning algorithm easier, but finding the representation that is best is
quite challenging.

~~~
tfgg
I think it's possible, it'd just take serious computing power. There are a
number of physically guaranteed symmetries which would be silly to make a
machine learn:

a) Permutation symmetry of identical atoms b) Rotational symmetry of system c)
Translational symmetry of system

I suppose it could learn them approximately, given enough examples, but why
bother? I think it'd be kind of like not using a convolutional neural net for
recognising digits in photos and just using a bazillion more weights and
examples.

I'd say you'd start with a completely general learning model which respects
the above symmetries and then see where it takes it.

However, I don't know how you'd make a transferable N-body potential from a
model taught only on some number of atoms. Again, I guess it's kind of like
training a CNN handwriting recogniser on 256x256 images and then applying it
to arbitrary sized images, which you can only do by assuming locality and
translational symmetry of the features.

------
wslh
What is the place of support vector machines once deep learning techniques are
going mainsteeam?

~~~
textminer
Am I crazy, or is the short-run benefit more for unsupervised feature learning
more than classification accuracy? If I recall from my brief reading on the
subject, kernel-aided SVMs and Random Forests are basically equivalent to a
three-layer deep graphical net, but on painfully feature-engineered inputs.
I'd wager learning features is more useful now than any benefit of a super-
deep architecture.

(I would love for a practitioner to shoot this down or corroborate my hunch--
awfully new to neural nets, having basically read a monograph, played with
Theano, and watched some Hinton lectures.)

~~~
ihodes
What did you read, and would you recommend it? I'm looking for some good
reading on the subject.

~~~
textminer
Learn and be well: <http://www.iro.umontreal.ca/~bengioy/papers/ftml_book.pdf>

Explains and motivates the use of autoencoders (and sparsity constraints) and
restricted Boltzmann machines, too.

~~~
ihodes
Thank you!

------
maurits
Handouts and two video lectures that accompany this tutorial:

<http://www.stanford.edu/class/cs294a/handouts.html>

------
zachwill
I feel like ensembles are just so much easier to work with — and can be
incredibly accurate given you take enough time to fine tune the parameters and
provide the right features. Most ML problems that I deal with fit really well
with Gradient Boosting, and it's awesome to be able to see the breakdown of
how decision trees are voting.

~~~
lightcatcher
Using dropout with deep neural networks is an (extremely cheap) way to gain
many of the benefits of using an ensemble. If you want a great overview of the
dropout technique, watch this tech talk by Hinton:
<http://www.youtube.com/watch?v=DleXA5ADG78>

Also, the recent maxout algorithm (from Montreal group, authors of Theano)
that got state of the art results on several datasets is essentially just an
algorithm designed to do particularly well with dropout (as far as I
understand it).

------
TheDelta
Thanks

