
What is the difference between deep learning and usual machine learning? - hunglee2
https://github.com/rasbt/python-machine-learning-book/blob/master/faq/difference-deep-and-normal-learning.md
======
phinance99
Feature Learning.

Deep Neural Networks can learn features from essentially raw data. Usual
machine learning starts with features engineered manually.

DNNs also learn to predict from the features they learn, so you cold say (very
roughly) "DNN = usual machine learning + feature learning".

In practice manually engineering features is a time-consuming "guess-and-
check" process which benefits from domain expertise. Feature Learning, otoh,
is more automatic and benefits from data, computing resources, and
optimization algorithms.

~~~
Kip9000
No. In all supervised learning, the algorithm learns a model from the data and
generalise it for unseen data. Better generalisation means better model. In
unsupervised learning, (mostly clustering, density estimation etc) the same
thing happens but we don't tell the algorithm what to learn.

Deep learning is not machine learning plus something else. It is a collection
of techniques that overcomes the scalability problem of feed forward neural
networks. NNs are very difficult to scale over number of layers. Standard
training method of back propagation can't handle many layers because of
vanishing gradient and the computational infeasibility brought on by the
explosive growth of connections.

NNs are very difficult to scale with additional classification targets you may
require (for example, you have a classifier for categorising 10 classes, but
to scale it up to 20, requires a lot of topological changes and qualitative
analysis.)

Deep learning addresses the scaling over layers with various techniques
coupled with hardware acceleration (GPUs). Currently this stand at about 150
layers.

~~~
Houshalter
Even experts often use the "feature learning" analogy. I don't think it's
wrong, or at least a bad way of explaining it.

The difference between (deep) neural networks and shallow machine learning, is
that NNs _can_ learn arbitrary features. Yes clustering doesn't require
feature learning. But it is also super limited in the kinds of features it can
learn. Neural nets can learn arbitrary circuits, and other types of functions.

~~~
halflings
Gaussian Processes can learn any arbitrary function. Is it "shallow" machine
learning?

I think the point that the parent makes is valid: Most of the advantages of
deep learning when using a "simple" feed-forward topology is advances related
to scaling learning and solving problems encountered at with difficult tasks
like image recognition, etc.

I do not know enough about neural nets to say if that is all there is to it,
but one thing is sure: it's not just about "learning features", although it
was shown that the output at every layer abstracts some sort of higher-level
features (in the case of image recognition)

~~~
YeGoblynQueenne
>> it's not just about "learning features"

So, I'm in no position to prove this, but my intuition is that any machine
learning algorithm can be configured in a semi-supervised learning set-up,
like deep nets have. You could train a decision forest classifier for instance
to learn in an unsupervised manner. An algorithm I'm developing for my MSc
dissertation is essentially unsupervised recursive partitioning, a.k.a.
decision trees (only, first-order rather than propositional).

Well, possibly not _any_ algorithm. But I get the feeling that many
classifiers in particular could be adapted to unsupervised learning with a bit
of elbow grease, at which point you could connect them to their own input and,
voila, semi-supervised learning.

But like I say, I don't reckon I'll be in a position to prove this any time
soon.

------
wwwdonohue
I'm going out on a limb here but I'm guessing if this explanation makes sense
to you, you don't need to be told the difference between deep and usual
machine learning. Could be wrong!

~~~
logicallee
It sounds like you're implying, but don't want to state, that the explanation
is not clear enough for outsiders (such as yourself?)

If I have read your comment correctly, I'll say it for you: as an outsider, I
read this carefully until I gave up because it was too technical, which
happened right at the very top, in the third paragraph:

>Those hidden layers normally have some sort of sigmoid activation function
(log-sigmoid or the hyperbolic tangent etc.). For example, think of a log-
sigmoid unit in our network as a logistic regression unit that returns
continuous values outputs in the range 0-1

All this implies I know all about multi-layer perceptrons - and I don't. I
can't follow the instructions to "think of a log-sigmoid unit in our network
as a logistic regression unit" because I don't know what those terms mean.

Just as I would give up on a recipe if I got to an instruction I didn't know.
For example, if I read:

>Glaze the meringue with a blow torch, or briefly in a hot oven.

Yeah, uh, no... I don't even know what glazing means, or what is "briefly in a
hot oven". So I just stop reading. When I'm instructed to do something I
can't, I go look at something else unless I'm feeling very adventurous.[1]

This blog post isn't written at my level.

\--

[1] as a last hoorah I'll open a tab and Google
[https://www.google.com/search?q=what+is+glazing](https://www.google.com/search?q=what+is+glazing)
\- likewise I tried
[https://www.google.com/search?q=what+is+a+multilayer+percept...](https://www.google.com/search?q=what+is+a+multilayer+perceptron)
but decided after reading the Wikipedia link that it was too "deep" for me.

~~~
kaffeinecoma
A month ago I was in the same place- I would start reading a short blog
posting on RNNs/ConvNets/etc., and within 2-3 paragraphs my eyes would glaze
over from the math and other foreign terminology. Frustrating. To try and fix
this I am "auditing" the Stanford course on ConvNets:
[http://cs231n.stanford.edu/syllabus.html](http://cs231n.stanford.edu/syllabus.html)

I'm about 2/3 done with the homeworks, and I understand this stuff now. I'll
never be a data scientist, but I know enough to implement these networks on my
own, and to understand blog posts like this. It's a lot of work for one
course, much more than I remember from my own undergrad years. I had to
revisit Calculus & Linear Algebra too. But if you're genuinely interested in
this stuff you can pick it up.

~~~
kkarakk
"I had to revisit Calculus & Linear Algebra too" \- what resource would you
recommend for this? after being a web developer for a couple of years i find
myself rusty and unable to find good resources for this. Trying to get into
machine learning but i've forgotten most of the math

------
pmoriarty
This may come as a shock to the layperson, but there's more to artificial
intelligence than neural nets, and many non-NN AI approaches could arguably be
said to "learn", and are thus "machine learning" too, depending on your
definition.

I'm thinking of evolutionary algorithms, various other biologically inspired
computation techniques (of which NN's are but one example), more traditional
AI techniques such as expert systems, and a whole host of stochastic, non-
biologically inspired algorithms.

I'm not super familiar with NN's myself, so I can't say whether the
gigantically disproportionate attention from the media and the research
community is deserved based on actual superiority in effectiveness of NN's
compared to other techniques, or whether they're used and talked about mostly
because that's what most people know.

It would be interesting to hear the thoughts of a non-NN AI researcher
regarding this.

~~~
kmiroslav
NN are being talked about because they have been excessively effective at
advancing the state of the art in areas that were previously in a dead end.

Computer vision is such a field. The results provided by conventional
approaches had been plateauing for years and then one day, Yann Le Cunn
submitted a paper describing the work that he and his team had done in the
area with neural networks and which blew away the previous results on image
recognition.

Interestingly, this paper was rejected and it took a full year to the CV
community to finally turn around and accept NN as a valid approach to this
problem. I believe no one questions that it's the best methodology we have
today.

Speech recognition, automatic translation and natural language processing in
general are other areas that have benefited immensely from neural networks.

~~~
jameshart
So this is what confuses me about what is 'new' with respect to deep learning.
Neural nets are not new - I was aware of the existence of, and some of the
basic ideas behind, neural nets as a technology in the 1990s and I wasn't even
involved in computer science, so I assume that means that even then they were
a mainstream AI technique.

When I read content about 'deep learning' neural nets today I don't see
anything especially different to what my (admittedly shallow) understanding of
neural nets was back then. So what I'm missing mainly is - what changed? Is it
just that advances in compute power mean that problems for which neural nets
were impractical have now become practical? Is there something different about
the way neural nets are employed in 'deep learning' that is different than the
neural nets that were discussed in the past?

~~~
Eliezer
Previously, neural networks were trained by taking single steps down the
direction of sharpest gradient for the network. However, in deep networks with
lots of layers, the backprop algorithm (which I assume you already know about)
got stuck in local minima.

Deep learning got started when Hinton observed that a certain way of training
restricted Boltzmann machines wouldn't get stuck as easily, and hence by
pretraining the network as if it were an RBM and then switching to backprop,
it wouldn't get stuck in a local minimum as early.

As I understand it, nowadays the best method looks something like a
generalization of Newton's Method, wherein the direction you move takes into
account the second differential and not just the first differential or
direction of sharpest descent. You move furthest in the directions that curve
the least, and move the least in the directions that are most sharply curved.
It turns out that this (plus some other tricks) make it way easier to follow
continuous gradients in big weird parameter spaces, so now it's possible to
train deep nets, which are a kind of continuous gradient in a big weird
parameter space.

Tl/dr: People figured out how to move better through the parameter space of
neural networks, by taking into account the second differential plus some
other tricks. So now we can train deeper nets.

~~~
duckingtest
While there are newer algorithms (almost all first order btw), one of the most
commonly used for deep convnets, sgd with momentum, is from the 80s. It really
is mostly about computing power - one gtx 1080 has more tflops than world's
fastest supercomputer till 2001 [0]. The actual speed difference is probably
an order of magnitude larger due to an absence of communication overhead and
latency inherent in sharing work across thousands of separate cpus and lots of
slow ram. That would make one gtx 1080 equivalent - for neural net training
purposes - to a supercomputer from 2004.

[0]
[https://en.wikipedia.org/wiki/History_of_supercomputing#Hist...](https://en.wikipedia.org/wiki/History_of_supercomputing#Historical_TOP500_table)

------
mcv
There is no such thing as "usual machine learning". There are many machine
learning techniques, Neural Nets is one of them, and deep learning is a very
powerful way of training deep neural networks.

Deep Learning is absolutely interesting and powerful, but it's not like all
the other techniques are all the same, or necessarily less powerful, for that
matter. Different techniques have different applications.

~~~
shpx
The words "deep learning" don't refer to a way of training neural nets.

It's a marketing term for neural nets that have "a lot" of layers. Neural nets
have been around for over 30 years, calling them deep learning was a smart re-
branding move.

~~~
mcv
From what little I've read, deep learning is a new way of training nets. Deep
neural nets have also been around for 30 years, but backpropagation, their
traditional learning algorithm, isn't as efficient in training the middle
layers.

~~~
p1esk
You should keep reading. Backprop is still the best, and pretty much the only
way to train neural nets (deep or not). Other ways exist (e.g. weight
perturbation), but no one uses them. EDIT: I have to clarify: backprop is just
one half of the training algorithm, you also gradient descent, and there are
many variants of it.

~~~
mcv
Man, I've been out of Neural Nets a long time. I studied them back in the
early 1990s. Perceptrons and backpropagation. Didn't really keep up with the
state of the art, I'm afraid. Maybe I should catch up.

------
taneq
Deep learning is machine learning involving neural nets with more than one
hidden layer.

~~~
tomp
... and a method to train the hidden layers efficiently.

~~~
_delirium
That's arguable; most successful applications of deep learning aren't
particularly efficient on the training side, which is why they come out of
research groups that have Google-sized compute clusters available, typically
with big arrays of recent GPUs. I suppose they're efficient in the sense of
not requiring more computation than exists on earth.

There have been a bunch of algorithmic advances as well, but some advances
have simply been those increases in compute power: there are techniques that
didn't really work in 1985, where the technique or a close variant now works,
mainly because we didn't the big arrays of GPUs in 1985 that we do now. As
these get scaled up, the algorithmic advances are being found to not even be
necessary in some cases. For a period it was thought that autoencoder
pretraining was the big algorithmic breakthrough that made it feasible to
train many-layered neural networks, but a number of recent applications no
longer bother with the autoencoder pretraining.

------
Hondor
It has an unusual use of "we" instead of "us" in a few places. "... that helps
we to detect features ...". I wonder if the author originally used "you" and
"your" then did a search and replace to "we" and "our" because 2nd person
sounds too informal? I notice the former two words don't appear anywhere.

Re-reading it with "you" and "your" and I realize that it works just as well.
I guess it's really just a fashion which person you use. It used to be 3rd for
academic writing. Now it's 1st, but I guess the most informal way is still
2nd.

~~~
nilved
Could they be ESL?

------
ascendingPig
I don't buy it. Even DL researchers will point to representation learning
systems like word2vec, a shallow NN, as as examples of the success of DL
approaches.

My take: "Deep Learning" is performative
([https://en.wikipedia.org/wiki/Performative_utterance](https://en.wikipedia.org/wiki/Performative_utterance)).
An approach falls under the header of "Deep Learning" when used or developed
by someone who identifies as a Deep Learning Researcher.

------
hellofunk
None of these techniques are good in the context of human-style learning,
where a child (or an adult) can learning how to do something new from only a
few cases of trial and error, whereas a machine using any modern technique
requires an immense number of examples to see first (often millions) before it
can do the equivalent task but with less accuracy.

NN and others are totally awesome, but we are so far away from true learning
that is normally meant in the human context.

~~~
selectron
People can't learn things which are truly new from only a few cases of trial
and error either. For example, it takes a long time to learn a foreign
language or become an expert at chess. Humans mainly learn things better
because our training data is usually larger than that of the model for the
task at hand.

~~~
hellofunk
Suppose you want a computer vision model to start recognizing a particular
object (and its subtle variations) in photos, but it has never seen this
object before. How many of these objects must the model be trained on before
it can generalize and recognize the object again? Now show a child some
bizarre object just once, and see if it can find the object in any photos,
even if the object is at different angles or subtly varied. The child will
certainly do better, off just one training example.

------
S4M
Slightly OT, but what exactly is the difference between machine learning and
statistics?

~~~
capnrefsmmat
You may enjoy Leo Breiman's famous article "Statistical Modeling: The Two
Cultures":
[http://projecteuclid.org/euclid.ss/1009213726](http://projecteuclid.org/euclid.ss/1009213726)

What he calls "algorithmic modeling" is what I see as machine learning style
thinking.

Naturally there's a lot of overlap and people in stats and ML often do related
work (our statistics department, at CMU, collaborates a lot with the machine
learning department), but there's a basic mindset difference.

~~~
S4M
Really interesting article. After reading it, it seems that statistics are
about devising a model (like y=ax+b+epsilon) and then fitting it on the data
while machine learning algorithms are about letting the data make their own
model).

~~~
nfoz
I think a statistical view of that dichotomy is "parametric statistics" vs.
"non-parametric statistics":
[https://en.wikipedia.org/wiki/Nonparametric_statistics](https://en.wikipedia.org/wiki/Nonparametric_statistics)

I see "machine learning" as exploring computational approaches to non-
parametric statistics. The idea of data-driven model-estimation is not outside
the scope of "statistics".

------
MichailP
Can someone say a bit about minimum number of layers and neurons needed for
structure to be called artificial neural network? I have seen papers with
something like 10 neurons (which is very very tiny but authors claim it does
the job), while on the other hand there are Google sized ANNs. Thanks :)

~~~
florianletsch
You can call it an artificial neural _network_ (ANN) as soon as you have two
or more artifical neurons connected to each other. As simple as that.

A small amount of neurons might already solve some problem you're having. The
XOR problem can be learned by 4 neurons connected to each other.

When you want raw images or similar as an input and have it be classified into
100 classes (e.g. look up CIFAR-10 or CIFAR-100), you will need an
architecture with many more neurons.

After all, ANN are simply a tool. Depending on the task, that tool might need
to more elaborate. And when you have all those different possible
architectures, you want a common way of naming them. Labels such as Deep
Learning are simply nomenclature of talking about _certain groups_ of
artificial neural networks.

------
boltzmannbrain
Trying to explain deep learning in a single FAQ response is silly. I submitted
a PR with an improved question and answer: [https://github.com/rasbt/python-
machine-learning-book/pull/1...](https://github.com/rasbt/python-machine-
learning-book/pull/13) (trying not to change the author's original content
much).

------
vonklaus
i think one of the one kf the problems is that articles like this; not just
about ML but advanced CS and science concepts, answer a question at a
complexity well above any person who would really ask it. By that I mean, if
you could follow the casual explanation and complex concepts that are assumed
that the reader grasps, you probably already know the difference between deep
learning & ML and thus you read it out of interest to compare your view or get
more information to add to your core base on the topics.

That is my 0.02, I couldn't follow this article without doing ancilliary
research/wikipedia lookups at least.

~~~
rasbt
Hi @vonklaus, thanks for the feedback. This was actually not intended to be an
"article" but more like an answer to a targeted question (I am the author of
this little write-up). Basically, someone asked me this specific question some
time ago (I think via email), and I answered it with this person's background
in mind. Then, I generalized it a bit more and added it to the FAQ section in
the GitHub repo in hope that it is also helpful to others. It's really more
like a quick overview, idea, explanation in contrast to a fully fleshed-out
blog article :)

~~~
vonklaus
hey thanks for the response, I am sure it is quite well done to your target
audience. Do you know of any higher level material that provides a good
outline conceptually but only assumes a really general knowledge? This is a
bit above my comfort zone :). I do really like, i think Joel Gruus, writing.
His fizzbuzz with tensor flow was hilarious. I am unfortunately not
particularly gifted in mathematics so libraries like that will likely be the
furthest ill go into ML/DL, oh btw whats the difference ;). jk

cheers

~~~
rasbt
you mean material specific to deep learning that is more general and less math
heavy? Hm, that's a good question, the resources I'd reference are all a bit
math heavy. However, don't be afraid of diving into TensorFlow, it's really a
nice library that takes care of all the tedious, mathematical details. E.g.,
in contrast/addition to NumPy (leaving out the comp. efficiency part out of
the discussion for now), it already implements several optimzation algorithms,
so you wouldn't have to worry about implementing backpropagation from scratch
or so. Sure, it still requires a bit of linear algebra, but it's really more
straight-forward than it seems at first glance :). Maybe, you'd be interested
in Keras ([http://keras.io);](http://keras.io\);) it's a wrapper around Theano
and TensorFlow which provides a really intuitive interface for building neural
nets! Haha, btw. I really enjoyed Joel Gruus, post ;)

~~~
vonklaus
great thanks. ill check out keras.io seems perf. cheers

------
platz
"We then connect those "receptive fields" (for example of the size of 5x5
pixel) with 1 unit in the next layer"

I felt this could have been expanded on.. still not sure how the sliding
windows map into units.

~~~
rasbt
I agree with you, it could surely expand this answer (I am the person who
wrote this little answer). Since this was originally _just_ an answer to a
question I answered via email (if I remember correctly), I didn't go into too
much depth regarding ConvNets, because I just wanted to answer this question
"briefly" in this mail. But if there's a demand for that (if it's useful to
others) I may end up writing a tutorial on ConvNets one day :). However,
there's already an excellent one out there, I highly recommend Dumoulin &
Visin's "A guide to convolution arithmetic for deep learning" at
[https://arxiv.org/abs/1603.07285](https://arxiv.org/abs/1603.07285)

------
paulsutter
Deep learning is learning-from-learning (stacking machine learning). And it
works because of backprop. That's the basic difference.

------
vikingyc
There is no difference.

~~~
B1FF_PSUVM
No, the hole they dug now is deeper. Better spades.

It will also take 30 years to get out of, as usual.

Winter is coming.

------
easytiger
Both are facile bullshit designed to eat up CPU cycles. Which, incidentally is
what major corporations are basing their business models on selling. I've had
clients come to me and say they want to use machine learning. When I ask what
for, it becomes very awkward.

~~~
taneq
Because five years ago you could Google search for "picture of the priest from
fifth element" and get relevant results, instead of pictures of priests, photo
frames, and boron. Right?

