
Some Starting Points for Deep Learning and RNNs - denizyuret
http://www.aistartups.org/2016/03/some-starting-points-for-deep-learning.html
======
devit
Is AI right now useful as a way to provide value for small projects or
startups? Any examples of that?

It feels like most examples of commercial AI usage are features (image search,
automatic face tagging, etc.) that get added to existing products by large
companies with access to large datasets and computing power.

~~~
romaniv
There is more to AI than complicated neural networks. There are simple well-
studied algorithms that deliver predictable results and work very well if you
apply them to the right problem. Look into K-means clustering, linear
regression and decision trees to get the taste.

Good introduction to the field: [http://ocw.mit.edu/courses/electrical-
engineering-and-comput...](http://ocw.mit.edu/courses/electrical-engineering-
and-computer-science/6-034-artificial-intelligence-fall-2010/lecture-videos/)

~~~
colllectorof
On a more practical side, there is a good AI book with a misleading name:
Programming Collective Intelligence. Instead of going over some abstract
algorithms, it poses a series of practical questions (recommend a movie,
calculate optimal price, filter spam) and then shows (with source code!) how
to solve them using common AI techniques.

~~~
toby
Appreciate the recommendation (I wrote that)! Honestly the book is very old at
this point, it may still be an approachable way to understand what's behind
the algorithms but there are much better ways to write the code now.

~~~
nswanberg
I wore out my copy. Have you come across any books or course with the same
approach but more updated code to recommend?

------
KZeillmann
I'm just now getting into neural networks and machine learning. I notice that
there's a lot of research going into RNNs, CNNs, and deep learning.

So far, I've only really delved into NNs with one hidden layer, and I read
that problems really only require one hidden layer. For what kinds of problems
are multiple layers required, or CNNs, or RNNs? It seems that there's a lot of
"cool factor" there, but I don't know what they bring to the table that's new.

Where I read about one-layer being "good enough" for most problems:
[http://stats.stackexchange.com/questions/181/how-to-
choose-t...](http://stats.stackexchange.com/questions/181/how-to-choose-the-
number-of-hidden-layers-and-nodes-in-a-feedforward-neural-netw)

~~~
shmel
Usually you need multiple layers when you train NN directly from raw features
(like pixels or words) for perception-like problems. In that case you want to
learn complicated multi-stage transformations to get useful high-level
features.

If you work with already extracted features (like a classical ML pipeline
where you can use SVMs or logistic regression as well), you may go with one-
layer network. But deep learning really shines with perception-like problem.

Don't be fooled by the approximation theorem (you can approximate any function
with only one hidden layer). It is beautiful in theory, but in practice it is
like saying that you can write any program in brainfuck because it is turing-
complete.

------
amelius
What are the classes of problems that can be solved by AI today? What are
current challenges?

~~~
TTPrograms
Generally speaking we have a handle on problems where you have lots of labeled
data of the classes you want to identify (lots generally being quantified
roughly by calculating N/(e^D) where N is number of entries and D is is the
intrinsic dimensionality of the data. The larger that number is the more
complex of a model you can train.).

Some hard long term challenges revolve around cases where you don't have a lot
of unlabeled data or examples of classes you care about. There's also the
technical challenges of training large scale models without obscene
computational resources.

I also prefer to reserve the term AI for generalized AI, which we're still a
ways off of, as opposed to modern classification problems etc. that I would
call machine learning (though I know that nomenclature is uncommon).

EDIT: jimfleming makes a great point about theory as well - we could likely be
much more efficient with better theory for deep neural nets.

~~~
tgflynn
N/e^D would be vanishingly small for most problems on which deep learning is
used. Image recognition for example may involve more than 1000 pixels and
e^1000 is a number that makes the number of the atoms in the universe look
tiny.

~~~
emcq
The qualifier there is "intrinsic" dimensionality. There is smoothness to the
image such that there is a lower dimensional embedding than 1m dimensions for
a 1 megapixel image, particularly with any applied setting.

That said there is a bigger problem with those bounds because it doesn't
incorporate model complexity. The VC dimension is much more insightful because
the complexity of your model and the hypothesis space it represents is
important for proper training. As an example, add a regularization term to
your model and you're no longer doing anything like N/e^D. Convolutions,
dropout, etc all prevent NN models from becoming too complex to train.

~~~
tgflynn
Do you know of a way to measure "intrinsic dimensionality" ?

~~~
emcq
In general it's an abstract concept like Kolmogrov complexity but there are
some practical approaches.

People often try to intuit the intrinsic dimensionality of a dataset by using
techniques like looking at singular values above some threshold or
reconstruction error versus changing the output dimensionality of a
dimensionality reduction/unsupervised technique like PCA, matrix
factorization, or an autoencoder.

An info theory person might argue entropy and compression ratios are also
insightful.

~~~
TTPrograms
I'd also note that for the case of typical classification that there's an
additional concept of "intrinsic boundary dimensionality" that should be low
due to the presence of invariants like scale, rotation and translation which
drastically lower the intrinsic boundary dimensionality.

------
platz
how relevant are hidden markov models now? Do they still have unique strengths
that aren't covered by other, newer models?

~~~
praccu
Some state-of-the-art industrial speech recognition [0] is transitioning from
HMM-DNN systems to "CTC" (connectionist temporal classification), i.e.,
basically LSTMs. Kaldi is working on "nnet3" which moves to CTC, as well.

Speech was one of the places where HMMs were _huge_, so that's kind of a big
deal.

[0] [http://googleresearch.blogspot.com/2015/09/google-voice-
sear...](http://googleresearch.blogspot.com/2015/09/google-voice-search-
faster-and-more.html)

~~~
platz
I'm kind of more interested in if HMM's are more viable as a general-purpose
tools rather than their applicability in cutting-edge research; or if they
should generally be avoided for common tasks unless the domain is simple
enough.

~~~
nextos
HMMs are only a small subset of generative models that offers quite little
expressiveness in exchange for efficient learning and inference.

~~~
platz
Thanks

