Not true. This paraphrases the original paper:
> They tested their best-performing policy network against Pachi, the strongest open-source Go program, and which relies on 100,000 simulations of MCTS at each turn. AlphaGo's policy network won 85% of the games against Pachi! I find this result truly remarkable. A fast feed-forward architecture (a convolutional network) was able to outperform a system that relies extensively on search.
Also, this article reeked of AGI ideas. Deep learning isn't trying to solve AGI. Reasoning and abstraction and high level AGI concepts that I don't think apply to deep learning. I don't know the path to AGI but I don't think it'll be deep learning. I think it would have to be fundamentally different.
I think that this is actually what the article is arguing.
from the article:
>Models closer to general-purpose computer programs, built on top of far richer primitives than our current differentiable layers—this is how we will get to reasoning and abstraction, the fundamental weakness of current models.
This means not using current deep learning ideas, and instead finding ways to integrate other types of programs (conventional algorithms, other types of ML) alongside Deep Networks.
Of course it did. It is about how machine learning can evolve to solve problems that require reasoning and abstractions.
The previous article was about the limits of machine learning and this one is about how to overcome them in the future. The limits was pretty much defined as "Reasoning and abstraction" so of course this article is about how to get that working.
Well, I dunno about "deep learning", but AGI is DeepMind's explicitly stated goal.
They go on to talk about general purpose learning machines.
This is a far cry from AGI. I think Dr. Hassabis rather in a tongue and cheek manner played with the terminology in the video. Deep learning and all the modern AI stuff you hear about is within the realm of "narrow AI", or more formally, applied AI. In his video, he uses "narrow AI" to define systems that rely on expert based heuristics and feature engineering, and general purpose AI to be what they are currently doing with reinforcement learning.
Whilst it's wonderful that their advancement in reinforcement learning has been applied to various different problems successfully, it shouldn't be confused with AGI.
AGI is on a totally different playing field. I don't think we are substantially closer to AGI than we were 50 years ago, and I would be very interested in anyone arguing the opposite.
I think at this point the only company trying to seriously tackle AGI is: https://numenta.com/
What do you mean? Each layer is selecting features and abstracting previous layers. A cat neuron abstracts all possible pixels that form cats.
He's just saying that we need better ways of composing this `deep` abstractions.
Tensor2Tensor, from the Google Brain team, has some strong recent results in that direction.
Related paper: "One Model To Learn Them All".
EDIT: I'm thinking deep learning will become much like web development is today. Everybody can do it, and only a few experts will work at the technological frontier and develop tools and libraries for everybody else to use.
Therefore, if one invests time in DL, then I suppose it better be a serious effort (at research level), rather than at the level of invoking a library, because soon everybody can do that.
I expect that common deep learning tasks are going to end up being performed by large pre-trained networks behind an API. Image tagging, sentiment analysis, etc. It will be possible to fine-tune these networks for specific tasks by retraining top layers using new data.
I don't think so.
The many problems bedeviling the expansion of an AI's competence at one specific task into mastery of more general and more complex tasks are legend. Alas neither deep nets nor genetic algorithms have shown any way to address classic AGI roadblocks like: 1) the enormity of the possible solution space when synthesizing candidate solutions, and 2) the enormous number of training examples needed to learn the multitude of common sense facts common to all problem spaces, and 3) how to translate existing specific problem solutions into novel general ones. Wait, wait, there's more...
These roadblocks are common to all forms of AI. The prospect of replacing heuristic strategies with zero knowledge techniques (like GA trial and error) or curated knowledge bases with only example-based learning is unrealistic and infeasible. Likewise, the notion that a sufficient number of deep nets can span all the info and problem spaces that will be needed for AGI is quite implausible. While quite impressive at the lowest levels of AI (pattern matching), deep learning has yet to address intermediate and high level AI implementation challenges like these. Until it does, there's little reason to believe DL will be equally good at implementing executive cognitive functions.
Yes DeepMind solved Go using AlphaGo's deep nets (and monte carlo tree search). But 10 and 20 years before that IBM Watson solved Jeopardy and IBM Deep Blue solved chess. At the time, everyone was duly impressed. Yet today nobody is suggesting that the AI methods at the heart of those game solutions will one day pave the yellow brick road to AI Oz.
In another 10 years, I predict it's just as likely that AlphaGo's deep nets will be a bust as a boom, at least when it comes to building deep AI like HAL 9000.
Absolute speed depends on the machine resource and the low-level optimization. If you take MCTS it is "enough fast" considering it has already beat Lee Sedol.
There are multiple means to measure the speed. Blind search is "fast" re: node expansion but takes exponential time to find a solution and quite likely exhausts memory. Search with heavy heuristic functions has "slow" expansion but can solve the problem quickly, despite the cost of evaluating them, since it can prune a large part of the state space and expands fewer nodes. Only the latter is practically meaningful.
Just read the classical AI textbooks like AIMA first, then [this one](https://www.amazon.co.uk/Heuristic-Search-Applications-Stefa...) if you want to know more. Maybe [this short paper](https://www.cs.cmu.edu/~maxim/files/searchcomesofage_aaai12_...) may give you the gist.
The best search methods allow reversible state updates. The reversibility makes things super-fast -- you no longer need to copy all data structures on path expansion. Instead, you modify only a single representation, incrementally. And when retracting a search step, you "undo" the same modifications again, arriving at the exact same state as when you started.
This is of course non-trivial -- it is much easier to copy everything, then throw away the entire copy when it's not needed, rather than keeping a single state incrementally consistent. But the effects due to data locality (excellent caching), better memory management (no allocations, fragmentation) and less work (only touch and update parts of the state that matter) can be tremendous.
I haven't seen any discussion of DeepMind's implementation details for AlphaGo, but since they come from a game development background (David Silver was the CTO and lead dev at Elixir Studios), where each cycle counts, I have no doubt they're well familiar with all these concepts. But then the TPU throws a wrench into it again...
MCTS's playouts do not need to backtrack (they are just the greedy probes), so it is irrelevant. By backtrack, do not confuse it with the backpropagaton in MCTS.
I do not see the connection to TPU.
The concept is completely orthogonal to choosing which node (next state) to expand next. It's about managing the internal representation of a state.
> choosing which node (next state) to expand next
I know this is "heuristic search", a broader class of algorithms which includes A＊ and IDA＊ and many more. But the core selling point of IDA＊ is not the "heuristic search", but its compact linear-space memory usage (thus fast) compared to A＊. So I did not suggested IDA＊ because of its heuristic search aspect.
For an efficient implementation, reversible updates are just natural and common in IDA＊. Below is an excerpt from . I believe this is equivalent to what you call reversible state updates.
* In-place Modification
The main source of remaining overhead in the expansion function is
the copy performed when generating each new successor (line 12 in Figure 1).
Since IDA＊ only considers a single node for expansion at a time, and because
our recursive implementation of depth-first search always backtracks to the
parent before considering the next successor, it is possible to avoid copying entirely.
We will use an optimization called in-place modification where the search
only uses a single state that is modified to generate each successor and
reverted upon backtracking.
By the way, this "local-changes-only" approach crops up in many other places too. In CS and nature, because it's just so damn energy-efficient.
For example, check out this (recent, July 2017) paper:
Gomez, Ren, Urtasun & Grosse: "The Reversible Residual Network: Backpropagation Without Storing Activations", https://arxiv.org/abs/1707.04585.
You can't help but think of the ancient programmer's trick for swapping two variables without a temp storage...
a = a + b;
b = a - b;
a = a - b;
And the author posted a comment on hn:
"fchollet: Hardly a "made-up" conclusion -- just a teaser for the next post, which deals with how we can achieve "extreme generalization" via abstraction and reasoning, and how we can concretely implement those in machine learning models."
I like the ideas presented in the post, but its not concrete or new at all.Basically he writes "everything will get better".
I do agree with the point that we need to move away from strictly differential learning though. All deep learning problems only work on systems that have derivates so we can do backpropagation. I dont think the brain learns with backpropagation at all.
* AutoML, there are dozens of these type of systems already, he mentions one already in the post called HyperOpt. So we will continue to use this systems and they will get smarter? Many of these systems are basically grid search/brute force. Do you think the brain is doing brute force at all? We have to use these now because there are no universal correct hyperparameters for tuning these models. As long as we build AI models the way we do now, we will have to do this hyperparameter tuning. Yes, these will get better, again, nothing new here.
* He talks about reusable modules. Everyone in the deep learning community has been talking about this a lot, its called transfer learning and people are using it now, and working on making it better all the time. We currently have "model zoos" which are databases of pretrained models that you can use. If you want to see a great scifi short piece on what neural network mini programs could look like written by the head of computer vision at tesla, check out this post: http://karpathy.github.io/2015/11/14/ai/
I think there will be some kind of meta deep learing (still using deep learning but compose of algebras which are augmented compared to today's standards). We have already started this by using pretrained networks for tasks. There is no reason RNNs won't go this way (i imagine they already are but this isnt my research area specifically) after all, RNNS are a turing machine.
I somewhat disagree with the author. I don't think that deep learning systems of the future are going to generate "programs", composed of programming primitives. In my speculative view, the key for general intelligence is not very far from our current knowledge. Deep learning, as currently we have, is a good enough basic tool. There are no magic improvements to the current deep learning algorithms, hidden around the corner. Rather what I think will enable general intelligence, is assembling systems of deep learning networks in the right setup. Some of the structure of these systems will be similar to traditional programs. But the models they generate will not resemble computer programs. They will be more like data graphs.
I expect within 10 years there will be computer agents capable of communicating in simplified, but functional languages. Full human language capability will come after that. And within 20 years I expect artificial general intelligence to exist. At least in a basic form. That is my personal view. I am currently working on this.
> I expect within 10 years there will be computer agents capable of communicating in simplified, but functional languages. Full human language capability will come after that. And within 20 years I expect artificial general intelligence to exist. At least in a basic form. That is my personal view. I am currently working on this.
When has "20 years" not been in line with the predictions of experts for the advent of AGI?
Another example: Feeding a neural network with millions of product reviews, and expect it to be capable to understand and write product reviews is hopelessly laughable. Not even with petabytes of data.
I started working on this, because I think that not enough focus is being put on AGI, or research is not creative enough. At least not that is being published. I am optimist with my work, and soon I will reveal more. But even if I don't reach my goals, I think that it is just a matter of persistence. At any moment, someone will solve the problem of AGI.
Do you have peer-reviewed publications? A github link? References to such? Talking big on the internet is easy.
I'm more interested in a general AI that can learn any game or environment it encounters to optimize a return. Not quite general AI, but a different path than what is going on right now.
It's all well and good to say we need generalizable machines, and something other than backprop, and something closer to traditional programs, but we all know this. The issue is that no one knows what this would even mean, never mind how one would go about implementing it. In the few cases we do know how, the results are horrible compared to the methods we already use.
We use the methods we do today because they work, not because we think they are the best, or because we don't understand the limitations of our models.
The only thing that's a bit off about Keras is that it's mostly the efforts of one guy. Sure, there's many other contributors, but they don't seem to be acknowledged. I've never seen anyone else speak for the project. I'd really like to see a neutral party emerge for deep learning practice and tooling, before the whole industry gets sucked into a single dominant ecosystem like AWS.
Using Artificial Intelligence to Write Self-Modifying/Improving Programs
There is always a research paper, if you prefer the sciency format.
BF-Programmer: A Counterintuitive Approach to Autonomously Building Simplistic Programs Using Genetic Algorithms
* Input of continuous unlabeled time-based patterns.
* Associative Hebbian Learning (when distinct inputs/patterns come together, they are neuron-wired together). Synapses can be modified via experience. See "Hebbian Theory".
* The brain is a prediction machine: it is always trying to predict the future based on past learned patterns. Learning happens when reality does not match the originally prediction and we rewire the world model based on new input. See "Bayesian approaches to brain function".
* Input signals are processed by many layers, each one creating more abstraction from the previous one, from sensory neurons to the highest cortex layers.
* Each region of the hierarchy forms invariant memories (what a typical region of cortex learns is sequences of invariant representations).
* There is lots of feedback (highest level neurons back to the lowest levels). In some structures (e.g. the thalamus, that is a kind of "hub of information") connections going backward (toward the input) exceed the connections going forward by almost a factor of ten.
* Brain uses Sparse Distributed Memory (SDM). See SDM by Pentti Kanerva (NASA researcher).
* Neuron models have many more variable/parameters (that can be used to transfer or process information) than usual nodes/links from artificial neural networks. E.g.: Long-term potentiation vs Long-term depression, neuronal Habituation vs Sensitization, inhibitory vs excitatory neurons, firing rates, synchronization, neuromodulation, homeostasis and more.
* The backward propagation of errors in artificial neural networks only occurs during the learning phase. But the brain is always learning and updating weights and relationships between patterns, given new inputs.
* During repetitive learning, representations of objects move down the cortical hierarchy (from short-term memory to long-term memory), forming invariant memories.
* The brain needs to replay the memory (memory rehearsal) of a learned stimulus so it can be stored in long-term memory.
* The job of any cortical region is to find out how inputs are related (pattern recognition), to memorize the sequence of correlations between them, and to use this memory to predict how the inputs will behave in the future.
* Predictive coding: the brain is constantly generating and updating hypotheses that predict sensory input at varying levels of abstraction.
Do counterfactuals have something to do with learning from negative examples and simulations? For example, if one shoots a ball and misses the goal to the right, one does not 'mindlessly' penalize the circuits that led to the exact motor decisions that were involved, but instead, one simulates alternative actions and uses e.g. (in this case linear) relationships between e.g. the angle of the foot or the wind speed and the shooting direction. The next time, one hence tries to aim slightly to the left.
Or are you referring to a much more fundamental level and my example might rather be a learning strategy that is more likely acquired by trial & error, reinforcement learning, meta learning ("learning how to learn") and/or via the shared concept space of language and culture?
Is it maybe related to e.g. prototype-based associative recall and a counterfactual is basically an alternative way of interpreting the data? "What error signal would I get, if I had interpreted X as Y?"
Or does it come from the Bayesian approach where you marginalize out all hypotheses, including the factual one that corresponds to the state of the world, but also all counterfactual hypotheses. So, including counterfactuals means going beyond the maximum likelihood point estimate e.g. by communicating confidence intervals or even entire distributions from neurons to neurons or neuron populations to other neuron populations?
Are there works that expose this limitation of MLPs more formally?
>not everything good can be made More Neural.
Neural networks are universal function approximators, so you probably mean not everything good can be made with MLPs trained by gradient descent?
>It's the lack of [...] the ability to perform inferences over discrete spaces.
How would you judge the extent to which AlphaGo has learned to react to single discrete changes in the input. It seems that it learned very well to react very sharply to whether a single stone is placed at a strategically significant position.
For instance, you want to program the personality of a toy, so you search around using the AI search engine for parts that might work. Or you want a relationship advice coach so you put it together using personalities you like, taking only the parts you want from each personality. Or another example would be just to make remixes of media you like. Because everything works without programming anyone can participate.
AFAIK GP remains the primary means to automate the synthesis of software. Though it was introduced perhaps 30 years ago, it hasn't been an active area of research for the past 20, AFAIK.
It would allow to maintain versioned versions of always improving models everyone can update with a `npm update`, `git pull` or equivalent.
That one word disrupts his whole point of view. This idea that we need orders and orders of magnitude more data seems insane. What we need is to figure out how to be more effective with each layer of data, and be able to have compression between the tensor layers.
The brain does a great job of throwing away information, and yet we can reconstruct pretty detailed memories. Somehow I find it hard to believe that all of that data is orders of magnitude above where we are today. Much more efficient, yes. And that's through compression.
I guess that's what I get after walking away for 30 minutes before posting. Doh!