
Deep Learning 2016: The Year in Review - buss_jan
http://www.deeplearningweekly.com/blog/deep-learning-2016-the-year-in-review
======
AndrewKemendo
_I find it helpful to think of developments in deep learning as being driven
by three major frontiers...Firstly, there is the available computing power and
infrastructure,...secondly, there is the amount and quality of the training
data and thirdly, the algorithms "_

I'm really glad this was the approach taken and his examples really show why
these are the important metrics with respect to DL. It's hard to underplay the
dependencies between these three frontiers. Radical breakthroughs in
algorithms make it easier to get better results on existing infrastructure.
Better data on the same algorithms and infrastructure can transform results,
and on and on.

~~~
IanCal
I think there's been a big impact on having commonly used tools too.
Tensorflow and the like have allowed people to share and easily build on each
others work, including on shared models. Not unique to 2016, but I remember
the struggles of trying to get hacked together matlab code working that it
turned out was actually slightly different than must have been run for the
paper. Now I'd expect a lot of work to come with a setup that anyone can
easily get and try themselves.

------
hacker_9
GANs seem really interesting, and a logical step if I'm right? As far as I
understand they essentially make the fitness function more detailed, leading
to higher quality output from the network as it get's more accurate good/bad
feedback.

~~~
deepnotderp
It's more a combination of two things: 1) finding a clever way to do
unsupervised learning. And 2) the Adversarial loss enables types of "loss
functions" that wouldn't be possible with day, mse. It's almost like having a
human as a loss function.

~~~
deepnotderp
*say, not "day"

------
bluetwo
There seems to be work on "big fish" to be done, like:

\- Weather Prediction

\- Stock Market Prediction

\- Healthcare (Better prediction, better reading of diagnostic tools)

\- Poker

~~~
hueving
>\- Stock Market Prediction

I can assure you this is being done. Anything that works is kept secret.

~~~
deepnotderp
Ditto, understandably, it's kept secret but would be nice to at least see some
of the models, presented outside the trading context. That being said, hft
world still primarily uses linear/logistic regression.

------
Moshe_Silnorin
Naive question, how many parameters do most large models have?

~~~
gwern
Low millions to a max of 1b or so (excluding special cases like "Outrageously
large neural networks" paper). The recent Google Translate RNN was in the
neighborhood of 50m parameters, IIRC. It can be a little difficult to
calculate since you have a lot of weight-sharing and tied-weights going on in
CNNs and other architectures.

~~~
Moshe_Silnorin
So the largest are about the size of a bee brain? Amazing how many
applications you can get from insect-brain sized networks, about a hundred
thousand times smaller than a human brain if a parameter is at all analogous
to a synapse. Seems like human-level AI is a ways off though.

~~~
argonaut
A parameter != a neuron. A single neuron probably has thousands, if not
hundreds of thousands, of "parameters" (of course the vast majority are
probably unrelated to learning - but given we barely understand neurological
learning it's hard to say).

~~~
Moshe_Silnorin
I was comparing parameter counts to synapse counts, not neuron counts.

~~~
deepnotderp
Please don't make these sorts of comparisons between biological neurons and
our "neural networks". They're really not the same thing at all. We'd rather
call them "differentiable networks", and IIRC, the human brain is supposed to
have only 6 layers in the visual cortex.

~~~
apl
Cortical layers are in no way equivalent to DNN layers.

------
mrfusion
What do you guys think the next big thing after deep learning is and when will
it come?

~~~
gallerdude
Hopefully deep learning gives us some sort of building blocks for a general
intelligence. It's too early to tell so far whether we're getting warmer or
colder.

One thing that is for sure though is that once we have artificial general
intelligence, everything changes infinitely forever.

~~~
rspeer
Let me make several similar statements:

"Hopefully machine learning techniques give us some sort of building blocks
for a general intelligence."

"Hopefully knowledge graphs give us some sort of building blocks for a general
intelligence."

"Hopefully Bayesian models give us some sort of building blocks for a general
intelligence."

"Hopefully rule-based systems give us some sort of building blocks for a
general intelligence."

"Hopefully LISP gives us some sort of building blocks for a general
intelligence."

"Hopefully computers give us some sort of building blocks for a general
intelligence."

People have said approximations to all of these over time. All of these
probably _are_ building blocks. But there's no reason to believe that we're
about to build the top of the tower.

I believe we should understand AI in terms of what it can do for us now, and
that AGI keeps appearing to be 50 years away because it's actually centuries
away and we don't even understand what we don't understand yet.

~~~
gnipgnip
Indeed! I understand that getting symbolic reasoning out of NN-like
architectures is the new "body-mind" problem (J. Tenenbaum ?), but I mean we
already have that with PGMs and Markov logic and all those things.

I really don't see why DNN change the game fundamentally. I mean graphical
models, they really changed the way of thinking in the field. DNNs in contrast
are really still fundamentally building on things built by Yann LeCunn and
gang.

------
jspisak
Amazon backing MXNet is pretty awesome and having it be a 'real' open project
is great for the community..

------
nojvek
I've been playing with openai universe. I would be shit scared by the
algorithm that achieves super human ability. With nvidias super computers, big
labelled datasets and better algos, it's really hard to grasp what the big
player and govt is capable of.

------
WilliamDhalgren
honestly whenever I read anything written about AlphaGo, I wanna start pulling
out my hair at the inanity of it. Look at this utterly uninformed comment for
example:

> As far as algorithmic ingenuity goes, this is pretty much all there is to
> it. With all the hype surrounding AlphaGo's victory this year, its success
> is just as much if not more attributable to data, compute power and
> infrastructure advancements than algorithmic wizardry.

... Like is anything about this sentence even approximately true??? There was
nothing new about the dataset used (and to this day I can't understand why
they use the exact datasets they do; KGS for the predictive net, and Tygen for
rollout softmax - both amateur player databases, and don't even use the GoGoD
database of pro players; seems other teams have comparable or better results
at prediction with it), nor in using a sizable cluster to execute a go playing
algorithm (limits to the size of it were and are when the algorithm one uses
hits steep diminishing returns but anyhow, already MoGo was using rather big
ones), nor in training on such a dataset for go playing. The only damn
difference were PRECISELY the algorithms used, exactly in contradiction to the
claim here! The nugget of truth there is that the algorithms aren't terribly
innovative , just their application in this problem - but its training setup
and targets are however literally groundbreaking.

Rewind the time a little bit, to the end of 2014, and you'll see the result of
an Oxford team, as well from Google's team, that demonstrate a large
improvement in the accuracy of the move predictor task - given a board
position, predict the next move from an actual game, by using a convnet to do
it. That was a first sign that deep learning had potential in go, though at
that point it didn't make for a particularly strong player. Systems were
getting to a point where they could bias the search effectively with such a
convnet for decent gains when the AlphaGo result was announced, that dwarfed
these already exciting advances!

So clearly its not about "just" using more computers nor bigger datasets, nor
just doing the straightforward thing in applying deep learning to the problem;
all of the above was done before AlphaGo, yeah it helped but there was just no
contest between the 7d KGS amateur rank the best of the rest were getting and
at or beyond top humans AlphaGo did.

AlphaGo came up with the second component, and actually solved a problem
thought unfeasable in the computer go world - creating an evaluation function
for go. The entire monte carlo tree search revolution of the mid-to-late 00'
in computer go was how to sidestep the problem of evaluating if a particular
board position is good or bad, by just running full stochastic playthroughs of
the game and scoring them instead. AlphaGo on the other hand first created a
decent-ish player network (though honestly nothing special - 5d KGS - that's
the one that's finetuned by reinforcement learning and notably isn't even a
part of the final configuration but just generates a large dataset
effectively, cuz humans just haven't played enough games in history for this
training setup), then generated a large dataset of games this network was made
to play, and then trained (supervised!) a net on predicting the game outcome,
given a board position (and taking just one position from each game, somewhat
conservatively avoiding overfitting this way).

THIS is the genious of the AlphaGo algorithm; it is a monte carlo tree search
algorithm, biased by a convnet, rolled out by a softmax, and crucially with an
evaluation function that is mixed, 50%-50% with rollout scores

THAT is a completely novel algorithm, nothing in the literature is
particularly like it, in particular the evaluation function and its mix with
rollouts was not considered possible! And it works orders of magnitude better
than any other monte carlo tree search tried since 2006, as well as orders of
magnitude better than other deep learning biased approaches tried since 2014.

And to think that the least of these things; ie the from all the work done
since 2014 on AlphaGo, 3 days (!!!) spend on finetuning(!!!) the prediction
net by reinforcement learning to make it stronger (though still just a 5d
amateur) so as to generate the needed learningset for another component is the
only thing mentioned about their approach !?

