
Deep Learning - The Biggest Data Science Breakthrough of the Decade - jph00
http://oreillynet.com/pub/e/2538
======
stiff
I am sorry, but does anyone else have the impression that a lot of people is
commenting here with so much confidence while clearly not knowing anything
about the topic? It takes almost an equivalent of an undergraduate mathematics
degree and than a lot of experience in ML itself to get a decent understanding
how things like Deep Belief Networks work, so I don't wonder none of the
comments so far hinted at any understanding of anything particular about deep
learning, just general derogatory comments "not used in industry",
"overhyped", pointers to whatever someone heard in an undergraduate ML class
on older types of networks etc.

Maybe if you don't have anything on topic to say, just do not comment? You
really are not obliged to have an opinion on everything.

(Waiting for the downvotes)

~~~
spikels
It is purely a matter of opinion whether DBNs are "overhyped" but I hope you
would agree that they are currently being "hyped". And I hope you understand
how this can actually damage the potential of what is likely some very good
technology like say like what happened to neural nets. :)

It is also a matter of opinion how widely they are being used in industry.
Certainly they are being studied in many companies but they do not a appear to
used much in production because of their complexity and high training cost.
This is still cutting edge technology.

In my experience most professionally trained mathematicians and statisticians
are still pretty skeptical of these claims. Wouldn't you agree?

~~~
stiff
It is OK to be skeptical, all I am trying to say is that most of the comments
leave the impression of the poster trying to lean over backwards to say
something at all related to the topic, often ending up with generic truisms.
For example how is model building being a small part of practical ML criticism
of deep learning? How is it on-topic at all?

~~~
spikels
Not my comment but I think they mean that it is a relatively small part of the
overall process and thus not "The Biggest Data Science Breakthrough". Maybe
not a fair criticism but they have a point.

I would love to see a breakthrough in data cleansing or how about just
standardized coding, labeling and formatting. Unfortunately I've used lots of
3rd party data sources and wasted more time on these brainless activities than
I want to think about. Consider yourself lucky if you only work on web logs
where you control what they look like.

------
ergodic
Well, it is definitely something but it being the "Breakthrough of the Decade"
seems pretty unlikely to me (given my available evidence).

I do not know well other examples beyond case of Automatic Speech Recognition,
but since this case caused a lot of noise, I bet it is responsible for a
reasonable chunk of the Deep learning "buzz". Here is my take about this.

If you look at papers from Microsoft like Seide et al 2011 and similar papers
the reported improvement against state of the art (up to 30%) is really
impressive and seems solid. Now, the technique is more or less using a very
big multi-layer perceptron (MLP), a technique already established two decades
ago (or more). There is some fancy stuff like the deep belief network based
initialization, but it does not make big differences. The core of the recipe
itself is not very new. What has changed is the scale of data we have
available and the size of the models that we can handle.

With this I am not implying that this is not a very interesting discovery. But
it is important to bear in mind that the change in the amount of data could
also make other 20 year old techniques interesting again. On the other hand,
neural networks had a bad name in the last years for understandable reasons.
They are a blackbox, or at least less transparent than the statistical
methods. This makes them prone to cause the "black box delusion" effect. You
hear a new algorithm is in town, it has fancy stuff like remotely resembling
human thinking architectures or cool math but you can not completely grasp it
guts, then "voila!" suddenly you are overestimating its relevance and scope of
applicably. MLPs were hailed as "the" tool for machine learning already once,
I think for these same reasons. For me the right position here is a prudent
skepticism.

On the other hand, this should also push people to try new/old radical stuff
since the rules of the game seem to be changing, it is not a moment to be
conservative in ML research :).

~~~
badfortrains
The thing that NNs have in their favor that other "20 year old techniques"
lack is their ability to model any mathematical equation. There is no
fundamental limit to the complexity of systems NNs can model (as there is with
other AI techniques).

The problem with NNs is the difficulty of training them. Back propagation with
random initial weights is simple, but it can easily converge on suboptimal
local maximum if the learning rate is too aggressive. On the other hand, a
slow learning rate requires an exponential increase in training time and data.
Back propagation as a method was never really broken, it simply wasn't
efficient enough to be effective in most situations. Deep belief techniques
seem to remedy these inefficiencies in a significant way, while remaining a
generalized solution.

Essentially deep belief networks seem to optimize NNs to the point where new
problems are now approachable, and greatly improve the performance of current
NN solvable problems. The complaint that "the core of the recipe itself is not
very new", seems irrelevant in light of the results.

~~~
iskander
>The thing that NNs have in their favor that other "20 year old techniques"
lack is their ability to model any mathematical equation. There is no
fundamental limit to the complexity of systems NNs can model (as there is with
other AI techniques).

I'm sure that a decision tree can also be viewed as a [universal
approximator](<http://en.wikipedia.org/wiki/Universal_approximation_theorem>)
if you let tree height go to infinity (just as you need to let layer size grow
unbounded with a NN). In practice, this power is at best irrelevant and often
actually a liability (you have to control model complexity to prevent
overfitting/memorization).

And, importantly, being able to theoretically encode any function within your
model is not the same as having a robust learning algorithm that will actually
infer those particular weights from a sample of input/output data.

------
jph00
I created this talk for the Enterprise Big Data track of O'Reilly's Strata
conference - so it's not a technical description of how deep learning works.
Rather, it's an attempt to show why it's important, and how it fits into
current data science trends.

The "Biggest Data Science Breakthrough of the Decade" in the title is a rather
bold claim, I know... But I think it might be justified. If there are are
bigger breakthroughs, I'd be interested in people's thoughts about what they
might be.

~~~
spikels
Not sure which decade you are talking about. If you mean the 2010s or the next
10 years we'll just have to see what the next 7 or 10 years bring.

But if you mean the past 10 years I would have to say that the "distributed
storage and processing" revolution (Hadoop and others) has had a much bigger
impact on data science than all of neural networks including deep networks.

Why the need to hype what is already a well publicized development? I'm
starting to cringe whenever I hear "data science" or "big data" and I love
this stuff.

------
espeed
Quoc V. Le (<http://ai.stanford.edu/~quocle/>) of Stanford and Google did a
talk today at Univ. of Washington:

    
    
      "Scaling deep learning to 10,000 cores and beyond":
      Presentation Univ. of Washington (March 14, 2013)
      https://www.cs.washington.edu/htbin-post/mvis/mvis?ID=1338
    

You can see one of Quoc's previous talks online:

"Tera-scale deep learning: - Quoc V. Le from ML Lunch @ CMU
<http://vimeo.com/52332329>

You may remember Jeff Dean's (<http://research.google.com/pubs/jeff.html>)
post on this:
[https://plus.google.com/118227548810368513262/posts/PozFb134...](https://plus.google.com/118227548810368513262/posts/PozFb134egM)

The corresponding research at Google...

"Building high-level features using large scale unsupervised learning"

<http://research.google.com/pubs/pub38115.html>

[http://research.google.com/archive/unsupervised_icml2012.htm...](http://research.google.com/archive/unsupervised_icml2012.html)

Previous HN discussion: <https://news.ycombinator.com/item?id=4145558>

\--

How Many Computers to Identify a Cat? 16,000
[http://www.nytimes.com/2012/06/26/technology/in-a-big-
networ...](http://www.nytimes.com/2012/06/26/technology/in-a-big-network-of-
computers-evidence-of-machine-learning.html?_r=2&smid=go-share&);

------
rm999
I'm concerned deep networks are being overhyped. They're certainly exciting,
but they haven't seen much use in industry yet; it's too early to make claims
about how they have impacted data science.

Also, data science involves a lot more than building predictive models. In my
experience >95% of effort goes into something other than building a model. In
kaggle contests you usually concentrate on that <5%, which IMO is the fun part
but it's not the reality of industry. There are many big breakthroughs in data
science that don't involve model building.

edit: I haven't listened to the podcast yet (at work), my comment is more
about the title.

~~~
lrei
I'm genuinely curious (not being snarky or wtv): what do you put 95% of your
effort into?

~~~
mshron
Not the OP but:

* Problem definition

* Infrastructure

* Data transformation

* Exploratory analysis (arguably part of model work)

* Results presentation

Then again, this is an ongoing disagreement I have with the Kaggle folks over
what constitutes "data science," where I'm pretty confident that "applied
machine learning" is a better explanation of what their contests are about.

~~~
lrei
I see. Thanks.

I'd say data transformation is a part of feature engineering (commonly the
bulk of the effort in a ML application). And exploratory analysis is part of
model work. W/o those 2 one would be building a model out of dreams and
wishes.

Data Science is probably a poorly chosen description. I'd say common use
includes infrastructure work which for most of us consists in engineering
work.

------
hsshah
We should discourage submissions like this here on HN that require
registration to view the content. So even though this topic is of great
interest of mine, I will not be upvoting it. Sorry.

~~~
drucken
While I agree with you, note there is already flagrant "abuse" of this on HN
by the posting of newspaper links with paywalls, in particular the New York
Times.

------
Aloisius
I would love to watch this, but O'Reilly's presentation streamer is awful. I
tried jumping ahead, but the video stream doesn't actually jump with me so I
end up listening to one part and watching another (tried under FireFox, Safari
and Chrome on Mac).

I don't suppose someone has an alternative version somewhere?

------
zby
I registered and listened for maybe 9 minutes - and nothing interesting, just
talk about the talk. How I hate webcasts!

~~~
anigbrowl
Agreed. Don't care much for podcasts either, although I can see the value for
people who drive frequently.

------
daniel-cussen
Does anyone know how RAM-intensive deep learning is? If the answer is "not
very," I think the GA144 might be a good candidate because it's a lot of CPU-
capable (independently-branching) cores.

~~~
lrei
dont have any numbers for you but it's typically CPU-bound and not memory-
bound. deep learning is often done using GPUs because of massive-parallelism.

~~~
wladimir
It may be computation-bound (I'm not sure) but training deep networks
generally does use a lot of memory, because of the giant training sets. You're
right that GPUs are a good fit, for example libraries such a theano exploit
this.

------
patrickk
If you're watching the webcast, skip to 3:20 to avoid all the promo and
buildup fluff at the start.

------
pilooch
Deep learning is so attractive to any AI and machine learning practitioner!
The results are beautiful to witness or read about. This is clearly another
step in a direction that many of us have been waiting (or working on) for a
long time!

That said, AI, like every other sciences, experiences trends and bubbles. If
you give a decent look to usages and problem solving with machine learning,
deep learning techniques are not exactly the final answer. Typically, they're
slow to train, to my knowledge there is no good 'online' algorithm yet to
train them (i.e. for autoencoders, recursive autoencoders, Boltzman machines).
Many applications, and a trend toward 'lifelong learning'[1], require fast
incremental learning that yields results in near real-time, or at least in
minutes rather than days.

I've compared a couple of unsupervised machine learning algorithms with
recursive autoencoders: the latter can learn deeper, very often, but at a
computational costs (days vs seconds). Deep learning computation will improve,
for sure though.

[1] <http://cs.brynmawr.edu/~eeaton/AAAI-SSS13-LML/>

------
softbuilder
This is a new topic to me. Are there any whitepapers or open source projects
touching on this?

------
amit_m
Argh! Worst streaming experience ever. Also, no contents in the first 20
minutes.

It took me a while to understand that the slides were in a popup was blocked.
After reloading, the slides don't match the audio.

------
wmat
Can this be viewed without registering?

~~~
jph00
I don't think so. I even had to register to view - and it's my talk!

I haven't received any marketing stuff from Cloudera or O'Reilly however.
Honestly, I doubt those companies would do anything questionable with
registrations.

~~~
learningram
I wish there was a download option

------
wfunction
Why are they are they tricking us into giving away personal information?

------
pbharrin
You need Flash to watch this even on Chrome.

