Anyone following DL news knows that DL alone will not lead to strong AI. The most impressive feats in the last year or so have come from combining deep artificial neural networks with other algorithms, just as DeepMind combined deep ConvNets with reinforcement learning and Monte Carlo Tree Search. There's not really an interesting conversation to be had about whether DL will get us to strong AI. It won't. It is just machine perception; that is, it classifies, clusters and makes predictions about data very well in many situations, but it's not going to solve goal-oriented learning. But it solves perception problems very well, often better than human experts. So in the not too distant future, as people wake up to its potential, we will use those infinitely replicable NNs to extract actionable knowledge from the raw data of the world. That is, the world will become more transparent. It will offer fewer surprises. We may not solve cancer with DL, but we will spot it in X-rays more consistently with image recognition, and save more lives.
I understand and empathize with the skepticism or rather criticisms around hand wringing with respect to the implications of current deep learning methods.
However, as someone who builds them for vision applications I'm increasingly convinced that some form of ANN will underlie AGI - what he calls a universal algorithm.
If we assume that general intelligence comes from highly trained, highly connected single processors (neurons) with a massive and complex sensor system, then replicating that neuron is step one - which arguably is what we are building, albeit comparatively crudely, with ANN's.
If you compare at a high level how infants learn and how we train RNN/CNNs they are remarkably similar.
I think where the author, and in general the ML crowd focuses too much is on unsupervised learning as being pivotal for AGI.
In fact if you look again at biological models the bulk of animal learning is supervised training in the strict technical sense. Just look at feral children studies as proof of this.
Where the author detours too much is assuming the academic world would prove a broader scope for ANN if it were there. In fact however research priorities are across the board not focused on general intelligence and most machine learning programs explicitly forbid this research for graduate students as it's not productive over the timeline of a program.
Bengio and others I think are on the right track, focusing on the question of ANN towards AGI and I think it will start producing results as our training methods.
First, I'm curious where you thought I was focused on unsupervised learning? It certainly didn't cross my mind when I was writing this --- I was (implicitly) strictly talking about supervised machine learning.
My post actually is in support of what the latter of your comment says in a round-a-bout way. In general, the people that are making huge strides in deep learning (Bengio, Hinton, Lecun are obviously the big three) understand the capabilities and, maybe more importantly, the limitations of DL. My main point is that the ML community at large is actually not on the same page as the experts, and that causes many more problems.
I want us, as a community, to stop treating deep learning any different than any other ML algorithms --- have a consensus, based on scientific facts, about the possibilities and limitations thus-far. If we, "the experts", don't understand these things about our own algorithms, how can we the rest of the world to understand them?
> I want us, as a community, to stop treating deep learning any different than any other ML algorithms --- have a consensus, based on scientific facts, about the possibilities and limitations thus-far. If we, "the experts", don't understand these things about our own algorithms, how can we the rest of the world to understand them?
I agree. It's interesting watching the "debate" around deep learning. All the empirical results are available for free online, yet there's so much misinformation and confusion. If you're familiar with the work, you can fill in the blanks on where things are headed. For instance, in 2011, I think it became clear that RNNs were going to become a big thing, based on work from Ilya Sutskever and James Martens. Ilya was then doing his PhD and is now running OpenAI, doing research backed by a billion dollars.
The pace of change in deep learning is accelerating. It used to be fairly easy for me to stay current with new papers that were coming out; now I have a backlog. To a certain extent, it doesn't matter what other people think, much of the debate is just noise. I don't know what AGI is. If it's passing the turing test, we're pretty close, 3 years max, maybe by the end of the year. Anything more than that is too metaphysical and prone to interpretation. But there have been a bunch of benchmark datasets/tasks established now. Imagenet was the first one that everyone heard about I think, but sets like COCO, 1B words, and others have come out since then and established benchmarks. Those benchmarks will keep improving, pursuing those improvements will lead to new discoveries re: "intelligence as computation", and something approximately based on "deep learning" will drive it for a while.
> If it's passing the turing test, we're pretty close, 3 years max
Well yes? If a Turing test you realize the simulation of some idiot in the online chat, it has long been there - and nobody wants. But the system, which can lead a meaningful conversation, today there is no trace. And there is even no harbingers of its occurrence.
<- This was translated by Google Translate from a piece of perfectly intelligible and grammatically correct text in another language. If this is the state of the art in machine translation, how on Earth can you expect a machine that can converse on human level in three years?
Sadly, I read the google translated text and it read like a person where english was their second language. I didn't realize it was an "example" until I read your next section. So it had me fooled.
You could probably replace 90% of YouTube comments with a simple trigram-based chat bot and no one would notice. But that's hardly a good measure of AI quality.
Although, your comment illustrates the main problem with the Turing Test. It depends on too many factors and assumptions that have nothing to do with the AI itself.
A good AGI test should be constructed in such a way that any normal person passes it with 100% certainty and no trivial system can pass it at all.
Over the past 10 years (deep learning got going in ~2006, really), the state of the art has improved at an exponential rate, and that's not slowing down. There are plenty of reasons to be bullish. Or at least, now seems like a bad time to leave the field.
I've read those papers when they came out. Correct me if I'm wrong, but they were not peer-reviewed.
The first one looks very impressive from the examples they've provided, but extraordinary claims require extraordinary proof. I will believe it only when I see an interactive demo. It's been nearly a year and I haven't seen it surfacing in a real product or usable prototype of any sort. Why?
Somehow all the papers that have "deep neural" stuff get 1/100th of the scrutiny applied to other AI research. I don't see anyone hyping up MIT's GENESIS system, for example.
The second paper has a really weird experiment setup. The point of one-shot learning is to be able to extract information from a limited number of examples. The authors, however, pretrain the network on a very large number of examples highly similar to the test set. Whether or not their algorithm is impressive depends on how well it is able to generalize, and they're not really testing generality -- at all. Again, why?
I'm curious where you thought I was focused on unsupervised learning?
I suppose I should have stated better that I was talking more broadly, however the discussion about hand training being what holds the discipline back I think implies that unsupervised learning is the key.
Perhaps I misunderstood your writing, but I think there is a nuance there that isn't well discussed.
> In fact if you look again at biological models the bulk of animal learning is supervised training in the strict technical sense. Just look at feral children studies as proof of this.
Feral children are incapable of learning a language as adults. But children don't require input (or feedback) from an existing language in order to "learn" a language; a community of children is capable of developing a full natural language without exposure to adult language. The problem the feral children encounter is not that no one teaches them to speak, it's that they're alone.
This is much as if you gave a neural network "supervised training" by letting it interact with several other untrained copies of itself, but no coded data. Does that meet your definition of "supervised training"?
(Obviously, a language created by an isolated group of children will be a new language rather than a known one. Children learning an existing language do need input, but there are some interesting quirks to the process -- they don't seem to need feedback, or use it if they get it, and in the natural course of things the input they get is 100% "correct" and 0% "incorrect". Try training a neural net to discriminate dogs from non-dogs when you can't code anything as "not a dog".)
Does that meet your definition of "supervised training"?
I think as a second layer, yes.
To your original statement though, being alone is the same as no-one teaching them to talk.
Adults, and other children, in fact teach infants to talk. It might not seem explicit (though there are very explicit programs for infant language development) but in fact it works the same way. See my other comment about object segmentation and classification.
> To your original statement though, being alone is the same as no-one teaching them to talk.
No it's not. You can have one without the other. Being alone implies no one teaching them to talk. However, children who are not alone still learn to talk despite no one teaching them.
>However, children who are not alone still learn to talk despite no one teaching them.
This is where we disagree. The distinction is that you probably see training/teaching as intentional - but it's not always. You can (and do) learn just by watching and listening without it being an explicit procedural training.
Again, congenitally deaf children who cannot learn any language spoken by the adults around them (since they can't perceive the speech) nevertheless create sign languages among themselves when they are part of a community of deaf children, and those sign languages display the full complexity and expressivity of any natural language. These de novo languages cannot have been taught to the children because they didn't exist to be taught. They are a product of the interaction between children, a side effect.
The same thing happens among the children of pidgin-speaking communities -- they receive input that does not belong to any language and, acting as a group, construct a full language from that. (This process is called creolization.) But the case of deaf children is especially clear because they don't even receive the malformed linguistic input that creolizing children do.
>If you compare at a high level how infants learn and how we train RNN/CNNs they are remarkably similar.
That doesn't sound correct. Infants engage in active inference to resolve their uncertainties, engage in transfer learning between different domains by default, and use their transfer-learning abilities to learn new categories or domains from even just one example.
So I would say - kind of. But at it's root, the mechanisms are based on having a social network of people touching, talking, playing etc... that does the training - though it might not look the same.
A perfect example I use all the time which is much later in development but relevant, is when a 1.5 year old will point at something and ask if it's a [thing]. The initial training was people segmenting (holding, moving different ways) classifying (calling it [thing], asking for [thing], telling the subject it is [thing]), which was then reinforced/weighted by asking if [thing] was indeed [thing].
So it doesn't look like explicit training, but it follows all the same steps.
"Here is my personal answer to the second question: deep neural networks are more useful than traditional neural networks for two reasons:
The automatic encoding of features which previously had to be hand engineered.
The exploitation of structurally/spatially associated features.
At the risk of sounding bold, that’s it — if you believe there is another benefit which is not somehow encompassed by these two traits, please let me know."
Let me ask a very simple question. What set of hand-engineered features gives <5% error on ImageNet?
Exactly --- none. But those features were born out of brute forcing a spatial exploitation, not some magical connection that we humans never thought about previously, which reinforces the point.
I would go farther and say that the success of deep learning comes mostly from one thing: putting convolution layers in neural nets, instead of just random connections or fully-connected layers.
Google's image and video recognition? Deep Dream? That's all convolution.
Speech-to-text? That's convolution.
AlphaGo? That's convolution.
Convnets are a great advance in machine learning, don't get me wrong. I hope that soon we get a generalizable way to apply convolution layers to text or music.
Speech-to-text? Not really. There has been good gains from adding convolutional layers, particularly for noisy speech, but the big big breakthroughs have been deep fully-connected layers and more recently exotic recurrent designs (LSTMs and the like).
Yeah, I should not make such sweeping statements. (Or maybe I should, because it's a great way to find out what other people's perspective is when they show up to correct me.)
My concern about the uses of convnets on text that I've seen is that I don't think they can deal with little things like the word "not". (The Stanford movie review thing can definitely handle the word "not", but that's different.) I'm unconvinced as of yet that we're convolving over the right thing. But maybe the right thing is on the way, especially when Google gave us a pretty good parser. And maybe the right thing involves other things like RNNs, sure.
I guess image recognition could have similar cases, and the results just look more impressive to me because I work with text and not with images.
I think there are some papers out of the IBM Watson group on question answering where they use ConvNets. I don't remember looking a the negation case specifically, but Question Answering generally has cases where that is important.
Again: convolutional filters existed long before CNNs did.
Yet no one solved ImageNet with some convolutions and hand-engineered features. You can't get that performance even if you shift a set of hand-engineered feature across an image in a convolutional fashion. No one solved it with any previous approach. (Are we to believe that CNNs are the very first method to ever try to exploit some spatial structure...?)
So deep networks do bring things to the table beyond hand-engineered features and you are simply wrong.
I think there is a categorization error here. Deep architectures are possible for many classes of learning algorithms: deep Bayesian nets, restricted Boltzman machines, graphical models, multilayer perceptrons, etc. Convolutional neural nets and LTSM are just two other types, and they exploit spatial and temporal structure explicitly.
My top reasons why everyone getting into maths/stats/cs should go straight for deep learning:
a. recent findings are documented incredibly well in both research and code
b. because of its success, there are many areas for useful contribution at relatively less effort from the researcher
c. because of its success, it'll help you develop marketable skills
d. it's fun
Maybe it won't solve General AI, but it seems like a damn good foundation for the person/people that will eventually come out with ideas that move us closer in that direction.
Which paper/blog post/book specifically documents the most recent research? E.g. just yesterday I figured out (after surfing the internet for a long time) that the diminishing gradient problem have essentially been solved by ReLUs, so auto-encoders are no longer necessary and we can train a whole deep network at once. Another is e.g. batch regularization. Probably not news for a researcher, but where could I learn these updates?
The only way to really stay caught up is to read the papers that are being released (NIPS, ICML, ICLR, CPVR, etc). If you use a course or guide, then you're behind by definition. If you're okay with being a little behind, then the Deep Learning courses for NLP or Computer Vision are both excellent. NYU also has similar courses. You can also try following some major Deep Learning people on Facebook/Twitter as they post about the new stuff daily. People to follow: Lecun, Lawrence, Bengio, Cho, Karpathy, just to name a few. Interested to hear others ideas on how to stay up to date.
I disagree that a. is true. When I have read DL papers I have been very aware that there is not enough detail to replicate exactly what was done. I often come away with the feeling of just being given the gist of what was done. I think this is a combination of papers that are too short, and possibly researchers wanting to maintain competitive advantage.
My impression on DL papers is that they are relatively clear to other papers in machine learning/math/stats. Especially for big conferences, many of the accepted papers have code released on github before the conference date either by the original researchers or by developers looking to get stars (usually googling paper and + github will help you here). A lot of these implementations end up turning into blog posts or being contributed to Theano/TensorFlow/Torch/Caffe/etc.
I don't see a huge competitive advantage in not releasing code. It's probably more beneficial to share ideas and collaborate. I imagine the reason it's not done more often than it is is because it takes time and effort to put this stuff out there.
"The automatic encoding of features which previously had to be hand engineered."
Yes, that is the main benefit.
The drawback is that we are still hand tuning architectures, slowly inventing (or incorporating) things like LSTM and the like into the model.
One goal would be to achieve a universal building block that can be stacked/repeated without the need for architectural tuning.
Maybe something that combines recurrence, one-shot learning, deep learning, and something stolen from AI (like alpha-beta, graph search, or something self-referencing and stochastic with secondary neural networks) into a single "node". Then we won't have to worry about architecture so much.
Why are we so preoccupied with the notion of "artificial intelligence" in the first place?
Artificial intelligence, if it can even be defined, does not seem like a particularly valuable goal. Why is emulating human cognition the metric by which we assess the utility of machine learning systems?
I'd take a bunch of systems with superhuman abilities in specialized fields (driving, Go, etc.) over so-called "artificial intelligence" any day.
Driving and go were considered waaay deep into "hard AI" just 10 years ago. Even if you mean to ask this question about "general artificial intelligence", you only need to go as far as speech or text comprehension for an example of something that needs to behave more or less like a person.
> Driving and go were considered waaay deep into "hard AI" just 10 years ago.
Oh, I know that. In fact, I think there are some very useful and very hard tasks which even AGI (at least a basic one) would not be able to solve.
My point is that we should focus on applications, not person-ness. Whether a technique is similar to human cognition or not, or the result is similar to a person, is immaterial.
Put simply, I don't think it matters much if all our super-powerful machines are ultimately Blockheads.
There is value in trying to emulate the human mind, which is in learning how the human mind works. A proper simulation of human intelligence would be an invaluable building block for health fields like psychology. Of course this is a very different kind of AI, and the focus on application specific measures of AI success have distracted from this goal.
No, being able to emulate a human is not necessary for understanding one and communicating with one. We can have great text comprehension (maybe even superhuman, won't that be fun?) and conversational interfaces without online learning, or without any overall model of interacting with the world.
> Because specialised systems are a subset of AGI (in the sense that GI is capable of producing them), hence AGI is seen as the ultimate goal.
I don't think one necessarily follows from the other though. We might be able to make an AGI with the intelligence of a human teenager. That doesn't necessarily mean it would be better at driving a truck than a more specialized algorithm.
> That would probably mean testing machine learning systems with something like Turing tests which no one is doing or advocating.
I see a lot of people complaining that various machine learning techniques won't lead to AGI and would never create a system which could pass the Turing test. Even this article laments that neural nets don't actually resemble human cognition.
> I don't think one necessarily follows from the other though. We might be able to make an AGI with the intelligence of a human teenager. That doesn't necessarily mean it would be better at driving a truck than a more specialised algorithm.
I agree that an AGI wouldn't necessarily be good at driving trucks. I meant it in the sense that an AGI would be capable of producing the "truck driving algorithm" (if we humans can do it, the AGI can do it too, almost by definition).
> I see a lot of people complaining that various machine learning techniques won't lead to AGI and would never create a system which could pass the Turing test.
Most of the complaining seems geared towards media and outsiders who portray X technique as being the "Solution to AGI", not towards the techniques themselves.
I think humans can do some cool things that we haven't yet figured out how to make an algorithm for. We can explore our environment, perform experiments, make hypotheses, and ultimately generate theory which have predictive power. Algorithms that can do that would be valuable and would have similar properties to human intellect.
One way to think of AI is what you say, highly specialized systems that have become very good at a specific task. This is very useful. It is also cool because you if you can define your problem well and come up with a fitness function then the algorithm solves your problem for you.
Another way to think about AI, and this is a much more general type, is that it could possible learn new things and create new theories by performing its own experiments much like a human scientist does.
> Algorithms that can do that would be valuable and would have similar properties to human intellect.
I agree. A "scientist" algorithm would be quite cool and helpful.
Where I disagree is whether that scientist algorithm should necessarily have intelligence that in any way resembles humanity. For all I care, it doesn't even need to be able to talk to be useful.
I disagree with your disagreement. Talking - language - is intimately connected to "science", which, as a form of human knowledge, is about reducing the complexity of the universe down into semantic constructs that humans can appreciate. That is, "scientific knowledge" does not exist except in the form of human language. There is no other meaning to "science" (or "knowledge") except "making the universe intelligible to humans".
A machine must also labor under these constraints, and so ultimately must produce knowledge in the form of language/semantics in order to be doing "science".
Limiting science to knowledge expressed in human language seems unreasonable. I think that it's the degree to which a system can understand and predict the world that matters - not whether it can explain it to us.
Might as well say that German mathematics isn't real mathematics, because it's all incomprehensible gobbledegook to someone who doesn't speak German.
Scientific knowledge already isn't expressed in normal human language; scientists build up a language of mathematical symbols, diagrams, and domain specific terms to encode and communicate their understanding in a more effective format. After all, how well do you understand a thing if you can't explain it?
I suspect the real issue will be difference in mental skills... there might not be a human who can understand what the AI understands, even if it is very good at explaining.
Sure, but unless they're willing to communicate their benevolent intent to us, alien science should be viewed as hostile. What is the point of a machine science that is not amenable to human examination? How could we be sure that the machine is acting in concert with human interests?
A purely machine science is anathema; if the machine does science, it must participate in the community of human scientists to validate its conclusions, or else its 'knowledge', however encoded, is as obscure and terrifying as the path of an impending asteroid.
Generating theories which have predictive power is exactly what deep neural nets do. Reinforcement learning systems do explore their environment and perform experiments, it's just that we give them access to a limited environment and reasonably clear objectives.
Really, we cannot draw a clear line between a learning algorithm and intelligence. Any algorithm that can incorporate information from the world it is exposed to into itself and use it to predict that world is "intelligent" in a sense, and all machine learning algorithms do that.
>Generating theories which have predictive power is exactly what deep neural nets do.
Not really. Some symbolic/inductive systems do that. Traditional deep neural nets have a single model that is being adjusted to fit the distribution of input samples. That is not the same as creating a theory, let alone many theories.
> Reinforcement learning systems do explore their environment and perform experiments
Since when does trying something 1000 times with incremental changes qualifies as experiments and exploration?
That single model is in the important respects as powerful as any model can hope to be, since it can approximate any continuous functions. Sure, it's finite, but so is my computer, and my brain.
Sure, we can't guarantee that the ways we have of training this model will be able to learn these function approximations. But the same is true of my brain.
So why would multiple models be better? Any set of multiple models can be aggregated and called one model, anyway.
Being adjusted to fit the distribution of input samples sounds unimpressive, until you remember it can in principle do it for all distributions. And it learns to generalize from the input samples to unseen samples. Making a model to generalize from what we've seen to what we haven't yet seen is exactly the same as creating a theory, as I see it.
Since when does trying something 1000 times with incremental changes qualifies as experiments and exploration?
Since when doesn't it? If it's 100 times, is it experimentation and exploration then in your eye? What about 10 times? Sounds like an arbitrary distinction to me.
The model behind ANNs defines how the system maps its inputs to its outputs. A theory, on the other hand, is something that describes an aspect of the environment/sample data regardless of system's outputs. One benefit of having several theories is that you can compare, test and invalidate them.
...
If your definition of "experiment" includes stuff like practicing a baseball pitch 1000 times, it is too generic to be meaningful. Experimentation implies purposeful gathering of knowledge by trying different things.
RL systems do experiment by that definition. For instance, DeepMind's first Atari player used an epsilon-greedy exploration/exploitation strategy: choose the action the model suggests is best with probability 1 - e, choose a random action (that is, "try different things") with probability e.
I don't what you mean by the "system's output" when talking about theories, but it seems to me that even by your definition, the weights of the ANN can be understood as a theory.
Having several theories is the same as having one theory, "one of these theories is more correct", or possibly "some combination of these theories is correct". If you define it as one theory instead of several, you can still improve it. (As I recall, some RL systems have multiple "heads", in a way corresponding to multiple theories).
I really dislike this notion. It reminds me either a giant evil computer hivemind or some sexy female robots. I think it is more a pop culture creation, the purpose of which is focused on sensational value that elicits fear or fascination, rather than a clearly defined concept.
What are "traditional nets" ? What are the "other learning algorithms" ? What is a universal algorithm (and for what)? Neural nets are universal function approximators. There isnt something [edit: a function] they can't learn. When stacked they seem to produce results that are eerily human-like.
I think the "universal algorithm" in the article refers to some kind of emergent intelligence. Well, nothing that he mentions precludes it. Our brains aren't magical machines. Neural nets may not model real neurons, yet it is amazing how they can produce results that we identify as similar to the way we think. There is nothing in computational neuroscience that comes close to this. If anything, the success of deep nets bolsters my belief in connectionism rather than the opposite. I would expect it is very difficult to formulate "intelligence" mathematically, and to prove that DNs can or cannot produce it.
SVMs, AdaBoost & co, random forests - just to name a few popular ones.
>Neural nets are universal function approximators. There isnt something they can't learn.
Which types of functions you can approximate has nothing to do with being able to "learn everything". Simple neural nets are a great illustration of this point. Networks that can theoretically approximate the target function can easily overfit or oscillate if not set up properly.
In other words, the ability to approximate is the property of your model, while the ability to learn things is the property of your training algorithm.
Absolutely right. By the same qualifications, it's possible, but also completely infeasible, to create a flat map that approximates any function. Because it is obvious that it won't be useful, nobody will write home about it.
The universal approximator approach is another big one that people lean on from the opposite direction (I suppose you could call it the "anti-deep learning" school of thought). Most people don't take the time to consider what a universal approximator actually is.
Are humans universal approximators? Absolutely not. The mere proof that a nn with an infinite number of hidden nodes (which isn't practical, which could end the discussion there) can find a statistical mapping between any two input and outputs is hardly a benefit. There is something so much more to intelligence --- the ability to generalize well (aka not overfitting) is first and foremost.
Trying to compare anything to "intelligence" is a trap, imho. Intelligence is not a natural quantity, it's a word. There was no reason for a brain to become a universal approximator, or to have any specific mathematical property for that matter. Brains evolved to adapt behavior to the environment using the senses. The universe didn't care for creating some property called "intelligence".
I do see many results in NLP for example that have an eerie human-likeness, even if nobody can explain why. I am not an expert in the field, but i would think it's a challenge to find similar results made using another technique.
> The universe didn't care for creating some property called "intelligence".
On the face of it, I would have to disagree. It would seem to me that's exactly what the universe selected for, among many other things of course, in our tiny little corner of it.
How about all the other universes that never generated intelligence?
But going back to our own universe, how widespread is intelligence? It's pretty much microscopic, isn't it? I think our universe seems to be mostly geared toward generating empty space, and the rare areas that are not empty are certainly extremely hostile to intelligence.
Predicting (function approximation) is perhaps the main use of neural nets but there is another very important use: generating, or in other words, imagination. And in generative mode we need to know about probabilities and latent variables - thus, we need a little more than neural nets to do this task.
Apparently function approximation is an important property to make something useful. It seems reasonable that it should be part of an intelligent system. Is there a similar "deep learning" based on k-nearest neighbor classifiers?
With kNN, if you have enough data and a good definition of "distance", then yes, it is in a way similar to "deep learning", just deep in the sense that's data and human-labor intensive.
I'm oversimplifying here, but what linear regression and logistic regression did to kNN is that you can automate the "distance" function, but you still have to manually construct the features. What DL did is one step further -- don't even bother feature-engineering, the network can construct the features themselves.
You see, there isn't a function that kNN can't approximate. If you have an impressive feature list and a training datum that has every feature exactly the same as your input, there is no reason not to directly use the output of said datum. It's the feasibility that matters.
Of course, DL is a huge step forward. It has made "impossible things possible, and hard things easy". The author also acknowledged DL's importance. However, that doesn't mean we should stop at DL.
> Apparently function approximation is an important property
'important' sure but not always a good idea. When you are learning from finite (but potentially very large) number of noisy examples, the universal function approximator will try to approximate the noise also.
Learning is a delicate dance between the capacity of your learner and the complexity of the thing that you are trying to learn. The key notion is that it takes two to tango. If you have few examples, you are better of not using deep nets.
Stated another way, a universal approximator has infinite potential for distraction. "Hey shiny" and it could veer off in a direction you don't want it to go. You want to pick out army tanks from pictures and it might learn to distinguish pictures taken on a cloudy day from those taken on sunny days.
You missed my point entirely. My point is that showing something is a universal function approximator indicates nothing about whether that method is a good method in general. Necessary but not sufficient. kNN is one of the dumbest/simplest methods out there.
There is a new research going on in Neural Turing Machine which, in theory, can exactly do that. Maybe we have to wait some time to see NNs can sort numbers.
The issue here isn't whether there is something deep learning can't learn -- with enough data and engineering even nearest neighbor can learn everything.
The problem is, would deep learning scale efficiently that it is feasible to learn the "universal algorithm" with DL?
I think neural nets are approved that with more than 2 hidden layers, it can approximate any functuons. Is there any literature for other algorithms as well?
It is true that neural networks can approximate any function. But what are feasible ways to construct such networks? Part of the point of having "deep" networks is that it's easier to train that way. Scalability does not only refer to the scale of the model itself, but also the difficulty in building such model.
> There isnt something [edit: a function] they can't learn.
This is way too sloppy. There isn't something neural nets can't be made to represent - but possibility of representation isn't the same as learning. Learning is not a property of neural nets, it is a property of parameter-adjusting algorithms like backpropagation which don't exactly match up with the representational universality of neural nets.
"Neural nets" have nothing to do with brains or "the way we think." This is marketing crapola.
the universal approx theorem is not sloppy. "Learning" and "Representation" are sloppy terms.
Neural nets have a little bit of something "to do with brains". They were inspired by connectionist ideas. I am all for avoiding marketing speak, but i dont find extremes useful.
I'm assuming the old feed-forward error back-propagation neural networks, before they got "deep". They had three layers (input, hidden and output) and were trained using straightforward error back-propagation (derivative of the training weights wrt the error).
This was a relatively popular technique in--I think?--the 90s (this is a guess, I started my masters ML in 2003 and at that moment neural networks were seen as a thing of the past). So at some point these traditional neural networks hit a wall. We wanted to use more layers but the error-backprop wasn't quite up to it. And as it later turned out, computing power and size of datasets were also lacking, but IIRC we didn't quite realize those aspects back then and mostly saw it as a limitation of the Neural Network training algorithm.
The logic is written by humans. The main mechanism by which computers / robots begin to outperform people in eg playing Chess, Go or Driving, is copying what works.
Humans outperformed animals because they were able to try stuff, recognize what works and transmit that abstract information using language.
The main advantage of computers is being able to quickly and easily copy bits and check for errors. You can have perfect copies now, preserving things that before could only be copied imperfectly.
And now you copy algorithms that work. The selection process might need work but the actual logic is still written by some human somewhere. It's almost never written by a computer. Almost all the code is actually either written by a human or at most generated by an algorithm written by a human, which takes as input code written by another human.
What's the "smarter" thing is the system of humans banging away at a platform, all making little contributions, and the selection process for what goes into the next version. That's what's smarter than a single human. That and the ability to collect and process tons of data.
All the current AI does is throw a lot of machines at a problem, and stores the result in giant databases as precomputed input for later. That's what most big data is today. Whoever has the training sets and the results is now hoarding it for a competitive advantage.
But really, the thing that makes all the system smart is that so many humans can make their own diff/patch/"pull request". Anyone can write a test and submit a bug that something doesn't work. That openness what made science and open source successful.
Open source has served the long tail better, too. Microsoft builds software that runs on some hardware. Linux has been forked to run on toasters. Open source drug platforms would have helped solve malaria, zika and other diseases faster.
If we had patentleft in drugs, we'd outpace bacterial resistance. Instead we have the profit motive, which stagnated development of new drugs.
There is more to that than precomputed data, sometimes a network can point out mappings that were not thought of beforehand. Humans copy what works, we spend ~25 years perfecting that ability.
Aren't we simply storing the precomputed result of human-devised feature extraction algorithms?
Yes they can run unsupervised, and yes they run for multiple iterations. But they are written by humans for extracting features from data, not much different than, say, PageRank 20 years ago iterating to approximate the "popularity" of a page on the web.
This AI doesn't see a problem and come up with an algorithm to solve it. It uses algorithms written by humans to accomplish goals set by humans. Nearly all AI today is still simply a preprogrammed machine.
What I am saying, though, is that humans + a system becomes smarter than a single human over time because we find ways to express "what works" in such a manner that a massively parallel computing platform can then find a better answer, faster and more consistently. That's it. This is the algorithm written by humans and collectively refined by humans banging away at a platform.
And why open source would be better than the profit motive when it comes to drugs.
At the current level, i agree the networks are "sloppily copying" patterns. However there is no inherent reason why networks will not be able to learn how to formulate problems, and learn how to devise strategies. If we consult brain science for this, there are no striking architectural differences between the visual cortex (a sensory area) and the prefrontal cortex (supposedly a cognitive area), and, as such, we would expect the same methods that apply to simple vision problems to be applicable to higher-level cognitive functions.
Woah, what's with tying this all back to open source and drugs? That last part seems tacked on.
Anyways, computers definitely outdo people by orders of magnitude at computation, and its only recently that they've been able to beat us at pattern recognition. Apply some inductive reasoning and see where that goes.
Besides, humans aren't exactly the perfect beings. Just like we can build cars heavier and faster than us, it's no problem building machines that are beginning to take over what we thought were uniquely human things.
Actually, that algorithm is the result of many iterations of a simple algorithm written by a human. It is itself the output of the algorithm, not too different than the machine code generated by a compiler after optimizations. The author says as much and names the algorithm-producing algorithm:
But at the core the approach we use is also really quite profoundly dumb (though I understand it’s easy to make such claims in retrospect). Anyway, I’d like to walk you through Policy Gradients (PG), our favorite default choice for attacking RL problems at the moment.
What the human is doing is identifying the problem, writing a solution, and simply letting the computer work out the parameters through various statistical methods. But the resulting algorithm itself is pretty much in the narrow class of algorithms that were already described by the human. The computer just executed a dumb and straightforward search through a space of parameters, which itself was a simple preprogrammed algorithm.
Look, most of our science is also pretty much parametrized models, often with smoothness assumptions for calculus. Now with Deep Learning we may indeed find more interesting parametrized models. But that is a far cry from understanding abstract logical concepts and manipulating them to come up with entire algorithms from scratch to solve problems.
You seem to put a lot of weight into the notion that all these things can be reduced to essentially a search for parameters with in a particular (not all-encompassing) solution space.
Do we have any good reason to suppose that a parametrized model isn't enough for everything we'd want, including a system that has human-level or higher intelligence and creativity? (assuming adequate structure that we don't yet know, that allows for a sufficiently large solution space)
We have good reasons to assume that we ourselves, the sum of a particular person's memories, skills, identity and intelligence, are contained within a particular set of parameters encoded by different biochemical means in our brains, and the process of how we learn skills, facts and habits is literally a search through that space of parameters.
"far cry from understanding abstract logical concepts" is more related to the types of problems we're tackling - symbolic manipulation and reasoning is a valid but very distinct field of AI, but it's not particularly useful for these problems any more than it's useful to having a human programmer craft explicit algorithms for computer vision.
You would expect a computer system "to come up with entire algorithms from scratch to solve problems" if you were making a computer system to solve the general problem of machine learning, i.e., a program to replace the research scientist making learning systems, not the human who currently solves the particular problem. We aren't trying to do that, it is a bit of a different direction, isn't it?
We have good reasons to assume that we ourselves, the sum of a particular person's memories, skills, identity and intelligence, are contained within a particular set of parameters encoded by different biochemical means in our brains, and the process of how we learn skills, facts and habits is literally a search through that space of parameters.
Actually, I have better reasons to assume that a parametrized model would poorly describe a human brain. We grow organically out of cells replicating in an environment for which they have been adapted. These cells make trillions of neural connections. Each cell has its own DNA etc.
We have already tried understanding just the DNA using straightforward parametric models, and they are too organic to be described that way.
It is far more likely that human brains are specifically adapted to the world they live in, and can operate with abstract concepts which are encoded in fuzzy ways (like a progressive JPEG, for example) that allow us to apply concepts to situations and search for concepts that fit situations. The concepts themselves are the hard part. It's not really a parametric model. Each concept represents experience that is stored between neurons.
Yes, we can teach these concepts to a computer eventually but we would have to figure out a language to express this info and data structures to store it. We'd still be designing the computer to mimic what we think we do. Ultimately for the computer to truly replicate what humans do it might need to simulate a gut brain, neurotransmitters etc. And even it would be only a simulation.
I think computer intelligence is just of a very different sort that human intelligence. Less organic, far less ability to come up with new concepts or reprogram itself to "understand" concepts. It is fed parametrized models and does a brute force serch or iterative statistical approximations, and then saves the precomputed results, that's all. That's why humans can recognize a cat with a brain that fits inside your head and consumes low energy, and computers need a huge data center which consumes a lot of energy.
We aren't replicating human intelligence. We are building huge number crunchers, and the algorithms are still written by humans.
Even our languages are too tied up in organic experience acquired over the years (refereces to current events, puns, emotions, fear of some animals vs dominance over others, inside jokes of each community etc) that language recognition is currently quite dumb and has trouble with context. Once again we solve this by dumbing down the human input, making people talk to computers differently than they would if the computer had "understood" anything they would say as a human with similar experience would.
When computers write algorithms to solve arbitrary problems the way groups of humans do, then I'd admit we made a huge leap forward. As it is, AlphaGo and self-driving cars are the result primarily of human work and refinement of the algorithms. It just is amazingly smart because computers crunch numbers fast, consistently and replicate what works across all the instances.
Computer AI does raise philosophical questions of identity and uniqueness, but currently they are not capable of true abstract thought.
And once again the rules "we all know" were fed to it by humans through a language and data structures and code devised by humans and now we will judge whether it does well and replicate the result to millions of machines. We are still doing nearly all the actual design.
Not sure why this is so highly upvoted. Nobody is questioning that deep networks work better than shallow ones, and there is a good understanding in academia of why (that fits with most lay people's intuition). I hardly consider that the most interesting or relevant question.
Actually, as a great fan of what has come out of the current deep learning hypefest, I'd question whether "deep" really matters. Most of the great successes have resulted from medium depth nets using the same shitty backprop algorithms that have been known for decades.
"Deep learning" is a red herring. They're just doing exactly what the shallow learning pioneers told them to do twenty years ago, just with more computing power.
There are a lot of innovations recently, not just the computing power -- layer by layer unsupervised training, batch normalization, ReLU, dropouts, momentum, gated memory unit, neural language modeling, encoder-decoder just to name a few.
"Since I am feeling especially bold, I will make another prediction: deep learning will not produce the universal algorithm. There is simply not enough there to create such a complex system."
While I (emotionally) agree, it will be interesting to see if the complexity (and non-linearity) of these algorithms permit 'emergent' behavior to appear.
You can create emergent behavior even with very crude, seemingly simplistic rule-based models. Forget about AI. Think about modeling processes. You can model a process using a very crude set of hand-crafted rules and still get useful simulations that give you valuable insight into the original process.
There is a wonderful free course about this kinds of stuff:
Or, without wanting to sound too conspiratorial, it's easier to sell the magic of "intelligent machines" to the consumer if they have been drowned in AI coverage in the media when there finally is helpful algorithms and useful voice interactions and so on. This is completely speculative on my part but I wonder about the autonomy of people and how it could change mental models if all the machines around you are supposed to be smarter than you
Has anyone proven logically that I, an entity, can create a new entity that is more intelligent than me? My intuition might be wrong, but how would that ever be a possibility, other than me aiding in the creation of a new human being?
Relatively weak human arms have enough strength to assemble tools that are much stronger than the original arms. A small, well placed flame can initiate a reaction that burns even hotter and brighter than itself. Our abstract concepts of strength, heat, brightness, and so on, are latent to varying degrees in the environment. Certain arrangements of raw materials release that potential.
If those analogies hold, then intelligence is still another concept expressed by some configurations of matter. Who knows how much "latent intelligence" is available to be released, but I guess the assumption is that it's much greater than what's already manifesting in our brains.
The most well-known rebuttal is that Einstein's mother created someone more intelligent than her. There was no intelligence when the universe began (at least as far as we know), then intelligence was created as an emergent property of biological systems.
I see otherwise, those end2end approaches are indeed leading me to think might that deep learning is the one algorithm that could help us discovery the universal representation of knowledge.
From my intuitive understanding (not an expert), very abstract description how it works in general:
- you have real world problem -> task which you need to solve
- you build model (algorithm, math method & etc) which should solve the task
- you need to find optimum of the complex function (error function)
Third step is usually finding optimum of the function. Deep neural networks help you to move complexity from step 2 to step 3. One example you mentioned, when feature engineering is moved from 2 -> 3. So you can use simpler methods on step2 to solve same problems, or extend problems area which you can solve with the same complexity on step2.
Can anyone recommend a good resource that summarises what the different algorithms are best suited for aimed at novices?
I've been working my way through http://neuralnetworksanddeeplearning.com/ (with a big detour back into maths thanks to the Khan Academy) and have done a few ML courses, but they mainly cover a couple of algorithms, not all the ones available in spark's MLLib or tensorflow for example.
In my opinion, in the 80-90's, neural networks and machine learning used to be 10% a solid concept in terms of academic research and 90% hype. Now neural networks and machine learning are 10% a solid concept in terms of being a practical applicable tool and 90% hype. Things have changed a lot because I almost run out of fingers when trying to express the orders of magnitude in which raw processing power has increased. You can literally feed the network with anything when training and get reasonable results later in recognition. That's one impressive yet humanly vague hash table there. And no, you don't have to wait for months or weeks anymore to train new things. Not even days, necessarily.
Why people pull in artificial intelligence is both naively optimistic and quite understandable. Modelling something of a neural system is so close to how biological brain works that the parallel is blatantly obvious. On the other hand, the current deep networks do not translate to intelligence; not at all. Machine learning might be, in part, something we could describe as "intelligent" as it's able to connect dots that are very difficult to connect by traditional algorithms but it absolutely is no intelligence. Then again, we do hang out in the same neighbourhood. If we will ever create an artificial intelligence in software I'm quite certain it will be very much based on some sort of massively deep and parallel network of dynamic connections.
I'm not that interested in artificial intelligence myself. I would be interested in artificial creativity and emotional senses, but to model those there are bigger metaphysical questions to be answered first.
I love the last sentence, and want to expand on it. If ANNs are tools to help computers perceive, then they are analogous to components or layers in the nervous system. If we map the nervous system thoroughly enough and understand the inputs and outputs of each layer/region, then reproducing a human-like nervous system might not be all that complicated.
If reproducing the human nervous system weren't that complicated, we could do drug design inside a computer. The ability to do that alone would be worth billions.
> If reproducing the human nervous system weren't that complicated, we could do drug design inside a computer.
While the myriad nuances of the entire human body are indeed significant roadblocks to drug development, we have a long way to go before those concerns represent the primary bottleneck to progress. If a simulation of a protein's local environment were to reach chemical accuracy (via either some algorithmic breakthrough in quantum chemistry or the development of scalable quantum computers), that would be a huge boon to drug development.
People have been working on neural nets for over 50 years now. The topic goes in and out of fashion. Nets are more powerful now and computers vastly more powerful.
https://en.m.wikipedia.org/wiki/Perceptrons_(book)
When you have to train a network with a zillion images of a dumbbell (http://www.businessinsider.sg/googles-ai-can-teach-us-about-...) for it to recognise what a dumbbell is and then it still gets it wrong (adding arms!), then somethings fundamentally broken, in so much humans don't learn like that. DL is a huge step forward but it's not ever going to be any kind of AGI.
As usual with tools, even these, a clear understanding of the specific problem, the relevant metrics and the expected goal is decisive. I am saying that experimental protocols are still devised by humans against a cost vs opportunity matrix. Brute computational force is not independent yet, artificial intelligence has not emerged yet.
> deep learning will not produce the universal algorithm
I'm curious what HN users think the "universal algorithm" will end up looking like?
My own guess (wild speculation) is that we'll start moving in the direction of concepts like tensor networks. While that term sounds like it has something to do with machine learning, it actually falls under the domain of theoretical physics. Tensor networks are a relatively recent development in quantum mechanics that show promise because of their ability to extract the "interesting" information from a quantum state. Generally speaking, it's very difficult to compute/describe/compress a quantum state because it "lives" in an exponentially large Hilbert space. Traditionally, the field of quantum chemistry has built this space up using Gaussian basis functions, and the field of solid state physics has built it up using plane waves. The problem is that regardless of the basis set chosen, it appears as though exponentially more basis vectors are required to accurately describe a quantum state as the system becomes larger.
Tensor networks are an attempt to alleviate this problem. While it is true that the state space of an arbitrary quantum system is exponentially large in the number of particles, it turns out that for realistic quantum systems, the relevant state space is actually much smaller — i.e., real systems seem to live in a tiny corner of Hilbert space. And this tiny subspace even includes all of the possible states that one could put a collection of qubits into within the lifetime of the universe.
The projection of a system's state vector into either the position or momentum basis is known as the system's "wavefunction" (some texts allow more than these two bases). Since the wavefunction exhibits the highly desirable property of being localized in position/momentum space, this allows one to build up a good approximation to the state using Gaussians or plane waves — that is, unless the wavefunction exhibits strong electron correlation (quantum entanglement). Quantum entanglement is the exception to nature's tendency to localize state space about a point in spacetime, and thus it is frequently the case that the most commonly used basis sets are highly suboptimal for many real electronic systems (superconductors stand out as a notable and somewhat pathological example).
I'm not entirely familiar with all of the math behind it, but tensor networks essentially describe the small but relevant region of Hilbert space by exploiting properties of the renormalization group. In this sense, a compact way of describing "real world" quantum states is developed. I think this has applications to a "universal algorithm", because real world data rarely consists of a random or uniform scattering of information across the data's state space. In my own research, I've found that a lot of the NP-hard problems I run into are efficiently solvable in practice (stuff involving low rank PSD matrices) precisely because the data isn't random. If tensor networks are good at finding a basis set that is "local" in abstract Hilbert space with regard to some real-world set of quantum states, then it seems as though they would work equally well for a lot of the real world data that lives on a low-dimensional manifold in a high-dimensional space — the kind of data that machine learning (and eventually artificial general intelligence) seeks to tackle.
I think the concept of "interpretability" is what you are getting at. I group that in with automatic feature engineering, since they are the same idea from different perspectives. Sometimes that is a benefit, sometimes it's not: http://blog.keras.io/how-convolutional-neural-networks-see-t...
> "deep learning will not produce the universal algorithm"
I doubt that a general algorithm exists (why should it?).
But well, if we are talking about human-level (or superhuman-level) AI, it is good to remember that WE are deep, recurrent neural networks (with a very different implementation, and spikes instead of floats, but still). If it work in vivo, why its abstracted version shouldn't work in silico?
> Nothing is more frustrating when discussing deep learning that someone explaining their views on why deep neural networks are “modeled after how the human brain works” (much less true than the name suggests) and thus are “the key to unlocking true artificial intelligence”.
While I get what he is saying here, and more or less agree, I think it is not to be taken lightly that there is a significant difference in this discussion now as compared to 30 years ago. The difference is not how neural networks work, which clearly differs but is related in some ways to the brain, but rather what neural networks see.
What is really significant when you can handle lots and lots of data, and throw it all at a giant neural network, is what we see happening in the network. The observation that the hidden-layer filters developed as an optimal feature for classifying images appear to be Gabor-like directional filters (I'm referring of course to this type of thing [1]) is not random, and not an insignificant result. It really does relate to perception, in the sense that 1) we know that the brain has directional filters in the visual cortex and 2) more importantly, from signal processing theory we know that such filters are "optimal" from a certain mathematical point of view, and if they develop naturally as the best way to interpret "natural" images (or other natural data, such as audio [2]), it shows that development of such filters in the brain is perhaps also quite likely. There is quite some research in neuroscience at the moment looking for evidence of such optimal filters in early neural pathways.
So yes, neural networks are not models of "how the brain works", but the newly established ability to process huge amounts of data, and to examine what kind of learning happens in order to optimise this processing, can tell us a lot about the brain -- not how it works, but what it must do. Complemented with work in neuroscience, the idea of modeling information processing is not unrelated and can really lead to some significant contributions in our understand of perception.. and perhaps, eventually, cognition -- but who knows.
The misunderstanding here is thinking that the be-all and end-all of neuroscience is studying how neurons fire and interact. Neuroscience is much more than that. Neuroscientists want to know how we experience and understand the world, and a big part of that is understanding what is required to process and interpret information, what is the information, what are its statistics, and what kind of neural processing would be required to extract it from our sensory inputs. Of course, this must be complemented by studies of how humans do react to stimuli, to try to verify that we do process information according to some model. But that model being verified -- that comes from what we know about information processing, and computer science can contribute there in a significant way.
No, it's a direct criticism of lazy or sloppy writing style in which authors fail to communicate effectively.
The term has specific meaning within a specific field (not clearly identified), and has been growing rapidly (which is to say: hasn't reached a stable level of penetration) within printed material over the past decade or so.
Disclosure: I work on the open-source DL project Deeplearning4j: http://deeplearning4j.org/