Hacker News new | past | comments | ask | show | jobs | submit login
Major advancements in Deep Learning in 2016 (tryolabs.com)
346 points by sameoldstories on Dec 7, 2016 | hide | past | web | favorite | 75 comments



The big question with all this stuff to me is whether we've just figured out a couple of new tricks (primarily around neural nets processing 2D data and word sequences for translation) and are now going to hit a new plateau in machine learning- or whether "this time it's different" and we're going to similar improvements year after year for the next decade or more...


If the only major impact of current deep learning methods on culture was to squeeze out an additional 1-5% of performance on every task set in front of it, the fact that it has made large-scale speech recognition and image recognition good enough for public use, it would be enough to call it substantial progress.

But I think one of the major successful things of the deep learning renaissance has been the ability to embed different 'worlds' into the same vector space (French and English, text and music, images and text, etc.). By co-locating these different worlds, we gain the ability to perform more accurate search, to create images from text like in the "Generative Adversarial Text to Image Synthesis" paper, and a wide variety of other multi-modal tasks. We can multi-embed almost anything, even things that are nonsensical. You want to make a music-to-font translation system? Or a sneaker-to-handwriting generator? Gather the training data, and the world is now your oyster. The impact of deep learning as differentiable task pipelines has only begun to scratch the surface of what is possible.


> "this time it's different"

It may be different for other reasons but the main difference I note today is the number of opensource AI/ML platforms that are trivial to install, use, play, experiment at pretty much near the peak computing capacity of the hardware we use today. Exploring the vast search space of reality has never been easier and faster than today.


Perhaps we're on a plateau with good real-life applications then :)


Probably true for the "internet scale" applications. But my impression is that there's still lots of medium-sized opportunities to build interesting things around the edges, in fields outside of the mobile/PC ecosystem.


What are some tools, frameworks, or platforms that you enjoy or recommend (for starters)?


Here's a chart: https://en.wikipedia.org/wiki/Comparison_of_deep_learning_so...

I'd recommend starting with Theano, Tensorflow, anything python. I do like Torch's raw speed on CUDA though.


You could say the same about smartphones. Mobile computing was a couple of new tricks and features that only really became powerful when people had computers in their pockets (like GPS), and for the most part the real discoveries have already been made, but it's all a step in the process and there's still a ton of applications present-day ML can do.

We haven't created true AI yet, but that's ok. We can make things better than before as we work on an even more advanced future.


Beyond that, as an outsider looking in I may be off base, but the rise of machine learning seems mostly fueled by the rise of GPGPU. So I would say it's not different, in that we experienced something similar with regular CPUs until about 2007. But it may be different depending upon how much, and how long, we can continue to cheaply speed up GPUs.


It's the rise of cheap computation and the availability of enormous data sets.

It's a bit of exaggeration to say so, so don't take it literally but in many ways the NN work is where it was when Papert and Minsky wrote the "Perceptrons" book in 1969. But as Stalin is supposed to have said, "quantity has a quality all its own" (thus the immense quantity of cycles and data people have at their fingertips causes a discontinuous change in functionality). Don't over-read this; I don't mean to denigrate a lot of cool work done in the past few years. But conceptually you are correct on the computation side.


This is fair, and one should certainly acknowledge the progress we have made as a functionality all its own, but there seems to be something else at work in the deep learning paradigm. I think (ha) we are dipping our toes in the water of intelligence replication, and that deep learning is a very real crystallization of hundreds of years (using Decartes as my x-axis) of scientifically accurate data amalgamation on what it is we are doing when we maintain a thought, so while the Perceptron is definitely the architecture for passing from the realm of information to the realm of data, what fascinates me is the proximity to actual human reasoning that is occurring presently within the 'mind' of, say, Watson, which I have to say is an incredibly providential name given IBM's founder, Sherlock's best bud, and James Watt's influence on electricity.

It's almost as if intelligent design isn't all that far fetched....lol


If that is the case, if the limiting factor was hardware, then we should see continuing growth for a good stretch going forward. There are quite a few companies looking to build specialized chips for machine learning, that could outcompete GPUs the way GPUs outcompete CPUs in this field.


>> are now going to hit a new plateau in machine learning- or whether "this time it's different"

I think it is actually both, yes we are going to hit a new plateau and yes this time it's different.

It is different not because we have found something profoundly new, but because we are able to quickly, easily and successfully experiment with huge (deep), new neural network architectures and learning methodologies.

This has become possible because a combination of factors that have come together towards the end of the 2000’s: e.g. much more computation power (GPGPUs), much more data available online, "simple" insigths such as progressive training of deep nets by stacking (auto encoder) layers, "Hey! Stochastic Gradient Descent works quite well actually!", Drop-out to improve generalisation capabilities, etc..

The great open source libraries such as TensorFlow, Theano and others make it even easier to do experiments. A framework like Keras even abstracts TensorFlow & Theano so you don’t have to worry what is used as deep learning framework.

So we shifted to a much higher gear when it comes to machine learning research, and this will be like this for a while. Computing capabilities keep expanding: GPGPUs become ever faster for Deep Learning, but also Intel has the Xeon Phi Knights Landing with 72 cores and upcoming variants with Deep Learning specific instructions (Knights Mill).

On the other hand we will definitely hit a plateau:

1) To make truly intelligent systems, we need to encode a lot of knowledge; knowledge that is common to us, but not at all to machines.

Bootstrapping a general AI with human-like intelligence, will prove very difficult. I think such AIs need to develop just like children acquire knowledge and cognitively develop. The type of problems we encounter to achieve this are of a whole other type, for one, just imagine how much time this will take before we get this right!

Imagine an AI that learns for a few years but fails to improve, can we reuse what it has learned in a new and better version of the AI? Will we capture all of its experiences to relearn a new version from scratch?

2) Apparently the human brain has a 100 billion neurons and trillions of connections, AlexNet (2012) has 650,000 neurons and 60 million parameters. We have grown the networks considerably since then, but compared to the complexity of the brain we have a (very, very) long way to go.

3) FLOPS/Watt : this is going to play an ever growing role in the success of AI. Our brain is incredibly efficient when it comes to energy use. We shouldn’t need a power plant for every robot we deliver to customers, right?


The former, if all things remain the same in terms of computing power and availability of data. I think betting against improvement in either of those areas is a bad bet so I'm optimistic we'll continue to see really interesting things in the near future


if all things remain the same in terms of computing power and availability of data

Right, but that's exactly the thing that is changing. All the promises of "Big data" can now actually be realized. The trick is that for the data sources that were set up long ago, is cleaning them and making sure they are structured correctly.


I think the former


This is literally just "whatever looked cool"... Where is alphago, neural arch search through RL, learning to play in a day, wavenet, and pixelcnn/rnn? That's just off the top of my head....


Actually, I think alpha-go would be more in the "looked cool" category. It was an applied engineering task, not so much a fundamental approach change like adversarial networks. That's not to take away from the monumental nature of the feat, but it seems like this blog post was more about higher level developments. Similarly, pixelcnn and wavenet were "what can you do when computing power is no object".

I would have liked to see something on RL^2 / "learning to reinforcement learn" which do seem like huge developments to me, but maybe are too new to see the impact of yet.


I don't know. I see a lot of papers which are getting great results on tough domains like program writing using the idea of tree search guided by NNs which may be very old but AlphaGo brought to everyone's attention by demonstrating what modern NNs could do in it. Any domain which is tree or DAG structured and you have a simulator for can be tackled much more effectively now.


I agree with this; I've seen people writing off AlphaGo as if it was very little new conceptually over MCTS (but just with more hardware). But the power of the NN was surprising.

There's a paragraph from the AlphoGo paper that I think speaks to this:

"We also tested against the strongest open-source Go program, Pachi, a sophisticated Monte-Carlo search program, ranked at 2 amateur dan on KGS, that executes 100,000 simulations per move. Using no search at all, the RL policy network won 85% of games against Pachi."


Heuristic search is a classic technique for getting things done in AI. Just wasn't good on recognizing the value of some moves. Aka scoring. Today's AI people have ANN's that are great at scoring. Their tunnel vision just made them focus almost exclusively using such tech. Then, a team combines two proven methods to do what each are good at. Big results follow.

These fads and tunnel vision happen all the time. The most interesting stuff often happens when a talented person inside the bubble opens their mind to bring in something outside it.


I think it's more a demonstration than anything, but unlike these demonstrations, no one expected just how powerful deep nets could be. No one expected they could win Go, by contrast, everyone (in DL) expected that they could do these things.

I do agree though that meta-learning is the future and very interesting, especially interested in the fact that DRL can now construct nets better than humans (and you can actually speed it up with an in-house technique I'm not allowed to share :( ).

IMO, wavenet and pixelcnn/rnn demonstrate the power of convolutions as a general purpose AI and broke the reign of RNNs/LSTMs for many tasks.


(first paragraph referring to AlphaGo)


AlphaGo was a serious milestone and message to the public that AI has broken another barrier.

http://www.nytimes.com/1997/07/29/science/to-test-a-powerful...

In 1997 this was the general thinking about it: ''It may be a hundred years before a computer beats humans at Go -- maybe even longer,'' said Dr. Piet Hut, an astrophysicist at the Institute for Advanced Study in Princeton


I agree it was an impressive feat.

I don't agree that it was clear it was as far off as that:

I believe that a world-champion-level Go machine can be built within 10 years, based on the same method of intensive analysis—brute force, basically—that Deep Blue employed for chess

-- Feng-Hsiung Hsu, 2007, http://spectrum.ieee.org/computing/software/cracking-go


Hi, sorry for the bother but the terms "RL^2" and "learning to reinforcement learn" aren't very googleable - would you mind sharing a link to the paper you're discussing?


Actually, googling 'learning to reinforcement learn' links to this paper - https://arxiv.org/abs/1611.05763


Meh. It's the old feud between fundamental research and applied engineering. You need both to keep moving forward. AlphaGo was a bit more in the applied camp.


No, these are innovative _techniques_ for designing and training networks. What you're describing are just good _applications_ that "look cool".


Any links?


I can't understand why people keep saying "AI has been democratized", when all the big research is being done by highly credentialed Phd scientists working for rich American tech companies.


Today, I downloaded the model weights for a state-of-the-art R-CNN architecture, ran some object recognition on some pictures. Took me 15 minutes to install the dependencies (bunch of one-liners), modify the code a little, and get it running on my data; all the internals are explained in the papers, and can be modified in the scripts. Just saying, could be worse.

To be fair, I agree that the big papers mostly come from google & MS, and I wish research was a real democracy.


Go and do some cutting edge research, and publish it. Nobody's stopping you.


I imagine the need to pay bills is what's stopping most people


Driving a car has largely been democratized, yet most of the car driving research is done by highly credentialed phd scientists working for rich american tech companies.

Although it may sound like one, I honestly don't intend this as a strawman, but as a, "I pretty much ALWAYS expect research to occur in that way." I pessimistically think most very deep fields that require significant domain background are largely out of reach of "citizen scientists" nowadays.

But even by the standard of doing research, I'm a rather uneducated (by the standards of a PHD AI researcher) programmer who has given multiple talks on AI models I developed, and uses relatively deep functionality on a day to day basis that I couldn't hope to implement the end to end of by hand. Depending on where you draw the line of democratization, I think we're already well into the promised land, and I'm only seeing continued positive improvements in this respect as well.


The analogy is a bit mixed up.

Driving a car has been democratized. Actually going in and fixing/modifying a modern car is increasingly less accessible/democratized, though still possible with quite a bit of training. Car internals are getting more complex every generation.

Likewise, using Google search / getting things recommended to you is obviously widely accessible. It is unclear how accessible ML research / going into the guts of a model actually is.

While I don't doubt you've given talks about applying ML, I do have a little skepticism that the talks went any deeper than just application-related topics.


"While I don't doubt you've given talks about applying ML, I do have a little skepticism that the talks went any deeper than just application-related topics."

Let me correct that misconception; every aspect of the algorithm was custom, from tokenization through classification through smoothing and postprocessing. It was a more simple model and process, but it was not exclusively application related. Far simpler than many of the deeper models I consume as a black box, for sure, but I'd defend my competencies in that I'm not exclusively a plug-and-play data engineer :)

Let me also emphasize that I intentionally called out "driving" as opposed to "fixing". In fact I can't think of any car-fixing research going on just about anywhere, although I'm sure there is. My core point was that _using_ the tooling is key, and has been democratized in both driving and AI (the application layer aspect as you put it), the research point was merely an add-on that I've also seen a democratization of the knowledge needed to do even the basic level of novel algorithm development, even if I'm writing deep network papers or something of that ilk; I just find that to be the exception rather than the rule (per my "hardcore research is hard for the non-deeply-initiated once low hanging fruit has been pruned" comment)


To be clear, many of the things you just mentioned are feature engineering-related and don't have much to do with ML.

Car construction/modification research is going on. It's what car companies are doing...

Driving is not a valid analogy because by that logic AI was democratized 10 years ago because everyone "uses" Google search.


There's is a vast ecosystem of free and open source projects that offer access to statistical and machine learning techniques now usable by non-specialists. I can now easily build recommendation systems scaling to billions of ratings, do facial recognition with a high degree of accuracy, fraud detection, or identify all the things in a photo (thanks ImageNet!).

I can build models to predict customer churn or ad CTR performance, extract summaries from large text documents, use unsupervised clustering to identify potential customer segments, determine if an image contains nudity, or if a 140 character tweet contains positive or negative language.

Open access has come a long way in a short period of time. It was only five years ago that Facebook introduced auto-tagging friends in photos (facial recognition), a feature that's now taken for granted but was once the realm of bad CSI TV shows and Las Vegas casinos.

Google, Facebook, Amazon, Microsoft, and IBM even started a consortium this September to share research [0].

"AI has been democratized" might be a bit overzealous, but I'm perfectly happy with the offerings on the table right now.

[0] https://www.partnershiponai.org/


Because that isn't true?

Looking at the first thing (GANs): developed by Ian Goodfellow, who is now at OpenAI. The first linked paper (https://arxiv.org/pdf/1605.05396v2.pdf) is out of University of Michigan and Max Planck Institute.

The second (https://phillipi.github.io/pix2pix/) is out of Berkeley, the third (https://arxiv.org/pdf/1609.04802v3.pdf) is out of Twitter (which I guess counts as a rich American tech company).

Google Brain/DeepMind/FAIR/MS Research do great work, but there is plenty of great work elsewhere too.

For example, just yesterday an implementation[1] of Fast Layer Normalization for TensorFlow was posted on Reddit[2] but someone, who says they don't really do C++ ("I am really new to CUDA and C++"). This can speed up training (sometimes) by 5-10X (!). That's democratization.

[1] https://github.com/MycChiu/fast-LayerNorm-TF

[2] https://www.reddit.com/r/MachineLearning/comments/5gt0wm/p_f...


No? Goodfellow has a PhD from UMontreal (one of the deep learning academic powerhouses), advised by Bengio. He then joined Google Research and then left for OpenAI (which has a billion dollars in funding and is backed by YC/Elon Musk).

The first author of that paper has a PhD from MIT and is a postdoc at Berkeley. Berkeley is also an academic ML powerhouse, and receives millions per year in industry funding just for ML. The first two authors are also former Microsoft Research interns.


It was mostly in response to the "working for rich American tech companies" part.

Yes, well credentialed people continue to do great work, even when more people are working in the field.


Sure, but my point still stands about all of these people having worked at rich tech companies or being funded by rich tech companies.


The tools and results are now freely available to the masses. But talent isn't democratized.


Detailed explanations of state of the art techniques. Huge archives of learning data. Massive amounts of cheaply available computing power. All of these are available to anyone on the internet. Anyone can get involved.

However, not everyone has equal skills, experience, and insight. And as you'd expect, the people with the best of all three are highly credentialed and are being paid highly for their valuable efforts by companies which an afford to do so.

The days of an untrained tinkerer in a back shed contributing to the state of the art are long gone.


There are also a lot of PhD scientists working at rich Chinese tech companies.


This is impossible to respond to without engaging in political analysis (particularly around equality of opportunity), but the reason that a small number of people do the research is largely that it's both expensive and requires a lot of highly specialized knowledge, which typically implies extensive postgraduate education.

Credentials can be abused, sure, and they're a shorthand for a more complex set of concepts, but they are on the whole a very good thing.


Because those techniques are then available to anyone with a similar problem. Research isn't democratized but the product is.


> "In order to be able to have fluent conversations with machines, several issues need to be solved first: text understanding, question answering and machine translation."

The article makes it sound like "text understanding" was just around the corner, maybe next year...

I doubt that because understanding (arbitrary but meaningful) text requires real intelligence, and AI is far away from that.

And if it really happens one day, then our jobs are all gone. Because programs are text, and with proper training programs are way easier to understand than arbitrary prose - a much smaller subset of concepts contained in a clear structure, instead of almost infinite concepts or even new ones, to be structured however the author sees fit.

So prior to understanding language, AI should be able to understand programming language, because programming language is just a a small subset of language.


Seems like a matter of hierarchy. Current NNs are flat and tiny. We're only simulating small chunks of the base levels. There is no executive level that integrates from multiple minions. You need that hierarchy to even start considering "meaning". What we're doing now is making bricks for that future building.

Of course, then there's that other school of thought that predicates the true grasp of meaning on the participation of consciousness. But that's another can of worms for another time and/or another forum.


Not all neural nets are flat, e.g. Recursive/Tree Neural Nets.


Florin_Andrei is clearly including recursive/tree NNs when (s)he says current NN's are flat (and I think depending on how you define "flat" I think this is accurate).


Programming languages are already an intermediate piece of text that computers understand really, really well. The problem is always one of human intention, and crafting the correct program based upon what the human intends to be saying, and not necessarily what they actually are saying.


Has there been any advancement in an "AI executive" lately? ie, a layer that sits above a number of other networks and drives it towards structured goal seeking like reproduction, food, pain avoidance?


The closest thing may be the work going on in "active learning" where the learning process is self directed, the agent adapts to limitations inherent in the data, and employs multiple techniques as needed, especially interactive Q&A to resolve unlabeled / ambiguous exemplars.

Until an agent takes some sort of initiative in directing its learning process, its behavior will remain scripted and passive, never amounting to more than a recognizer of patterns -- a rather low bar in the quest for autonomous intelligence, IMHO.


I'd be happy with a framework which could decide: this is a point in the free-form conversation to answer a question or interject some information.


I think when we finally start building hierarchies, when NNs "go meta", that's when we'll start seeing truly mind-boggling results.

There can be no AGI without hierarchical, "meta" NNs.


By most definitions, people have been building hierarchical NNs of many different kinds for years.


this is a great idea. then maybe we can put an executive-executive layer above the executive layer. then we would be getting somewhere


No mention of the PixelCNN WaveNet VPN ByteNet line of research?


And AlphaGo. I especially liked "Decoupled Neural Interfaces using Synthetic Gradients" because it seemed to open up training of complex models on multiple machines, but also because it provided an unexpected insight into how local learning can be done.


I'm still waiting to see synthetic gradients get applied anywhere else. When it came out, I thought it would at the very least revolutionize training of RNNs by giving them better BPTT, but it seems to've sunk without a trace.


Wow, amazing!

I like the idea of optimizing for efficiency too. Make the Neural nets scoring function a little more meta, how well did you score and how much energy (operations or neurons) were consumed.

Good for the folks at home with smaller systems and frees up resources for more...what else, neural networks.


There is a similiar post(and several discussions) on Reddit: https://www.reddit.com/r/MachineLearning/comments/5gutxy/d_t...


Self driving cars, beating Go masters, but still no one has solved Ms. Pacman?

It's not like they haven't tried. There are multiple research papers on it, AI contests held for it, even Deepmind tried and failed.

I think the best score I have seen is around 40,000. The best human players can get 800,000.


I think "tried and failed" is unfair unless they've actually announced they're no longer trying.


Fair enough - but they have been working on video games for a few years now.

It seemed especially ironic that DeepMind is a media golden child and they received a ton of press for beating video games. At the time they didn't hide the fact that Ms. PacMan was not yet successful, but not one article mentioned it.


Speaking as an interested techie without any direct experience with deep learning techniques, this is a terrific overview. Thanks!


Interesting how deep learning is evolving. Agreed that adversarial models are a huge advancement; it will be interesting to see how they progress over the next couple of years.


Yeah, gan's are huge. Mimicry plays a huge part in human learning; especially in language learning, where native fluency is more or less defined as 'difficult to distinguish from the real thing'. I expect we'll see a lot more coming from adversarial models in the future.


The CuriousAi TAG and LADDER research looks interesting. They claim same accuracy with much fewer labeled data.


What are the advancements in grounded deep learning for agents that integrates multiple senses and abstraction across domains? Or did anyone even try to do that?


Whoa, those images kind of weirded me out.


Indeed. Sort of a new form of computer generated art. Some of them I found quite unsettling.


And people complained/are complaining about javascript fatigue / churn... ha!




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: