Seems to my mind that deep learning is the cortex, and the binding function is the hippocampus. Add some executive control functions to guide learning strategically and it will start sounding eery.
[5 months later]
We welcome our new AI-Rat overlords? ;)
it was developed by Pentti Kanerva at NASA in the 80s
For related memory-augmented models see: https://en.wikipedia.org/wiki/Deep_learning#Networks_with_se...
There was also "Reasoning, Attention, Memory (RAM)" NIPS Workshop 2015 organized by Jason Weston on this topic: http://www.thespermwhale.com/jaseweston/ram/
There is disproportionate amount of work on training learning models while ignoring memory mechanics. For some reason this research is pursued by very few machine learning/AI labs, mostly at Google DeepMind/Brain, FB, and Numenta, and may be Tom Mitchell's 'never-ending learning' project at CMU.
Also, industry applications of deep learning are at least 1-2 years behind academia, so I don't expect to see any use of differentiable memory mechanisms in production anytime soon.
Anyway, the workshop reading list looks amazing. Thanks for this!
That would allow easier reproducibility, which is the opposite of the goal for papers that come out of industry. This is the purpose of industry papers (in order of importance):
1) Avoid sufficient clarity such that competing companies could reproduce the work.
2) Brag about the company's capabilities such that it increases interest from potential customers.
3) Maintain just enough scientific rigor and clarity that it is still publishable.
1 is generally top priority. 2 is the goal. If 1 prevents 3 then they just call it a white paper and publish only to arxiv or on their website. That way you still get exposure for 2 without compromising 1.
IIRC, that paper's results focus on the model's ability to extrapolate prediction of sequential data to arbitrary lengths having only been trained on sequences of a fixed length.
Brenden Lake et al. showed that a bayesian approach outperforms deep learning based methods in this Nature article: http://web.mit.edu/cocosci/Papers/Science-2015-Lake-1332-8.p...
DeepMind took the challenge, and delivered a deep learning based method, which uses no feature engineering and outperforms human-level performance in low-shot learning, thus providing strong evidence against the widely-held notion that deep learning cannot work with small amounts of data.
As it is, it could be that this particular setup happens to extract features of this particular dataset, while failing miserably on others.
Thinking about this at a higher level, I ask myself what constitutes true one-shot learning. The reason we care about it is because in real life most problems don't involve huge datasets of available solutions. On the other hand, this paper does involve training a model based on a large dataset of similarly typed, labeled data. The problem the algorithm solves involves items from the same set.
The first obvious question is the one I asked above: does this approach work for other types of data? The second one is whether the mechanism would work for more diverse datasets. Finally, the most important question is: how well will it perform on tasks that fall far outside of the initial training data? Because that's the true challenge behind one-shot learning.
It's kind of amazing that the paper doesn't try to answer any of those questions. Isn't that the real purpose of research in AI? (Most likely they tried and the results weren't good, but hey, there is no way to tell without re-implementing the whole thing.)
They compare it with human performance claiming their algorithm is better, but it's kind of a bullshit claim, since people weren't allowed to use scratch paper or see previous examples. So instead of a pattern-matching test they turned this into a memory test. Typical of current AI research. Those people will do anything to be able to claim "better than human" performance.