I don't get why deep learning researchers are so hung up on learning everything from scratch. The tendency for ever more compute and data is just unsustainable. For problems like natural language, where distinct evens may be infinite, you can keep throwing data at the problem and you'll never make a dent to it. There are problems that grow at a pace that cannot be matched by any computer, no matter how powerful.
What's more, as a civilisation we have nothing if not knowledge about the world. We have been accumulating it for thousands of years. It's what makes the difference between an intelligent human and an educated intelligent human. And it's a big difference. So, if we have all this background knowledge, why not use it, and make our lives easier?
Sussman attains enlightenment
In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6.
“What are you doing?”, asked Minsky.
“I am training a randomly wired neural net to play Tic-Tac-Toe” Sussman replied.
“Why is the net wired randomly?”, asked Minsky.
“I do not want it to have any preconceptions of how to play”, Sussman said.
Minsky then shut his eyes.
“Why do you close your eyes?”, Sussman asked his teacher.
“So that the room will be empty.”
At that moment, Sussman was enlightened
The problem is that these are extremely high-dimensional spaces, and even narrowing down the space within which the prior might be defined (much less finding the prior itself) is mathematically difficult. Hence the progress is slow and halting—it's consisted of discussions like what the representations fully-trained neural networks find within the early activation layers, which means the question is being investigated empirically, without the benefit of a fully-fledged theory.
Initeresting, the 'nature vs. nurture' debate can be viewed as basically a discussion of learning priors in the context of human beings. Turns out the truth is pretty subtle—for many problem domains, humans (and animals) can be made to learn a very large set of things with repeated training, but we also have strong priors, so some things can be taught much more easily and quickly than others.
Our brain networks are primed to learn certain things, e.g. ducklings have a prior that the first animate object they see is their mother. It's possible to force a duckling to unlearn its maternal imprinting, but only with repeated and large amounts of negative conditioning. It seems pretty likely that this prior is embedded in both the topology and neurochemical functioning of a normally developing brain.
That's what I got from the koan. Sussman thinks that a randomly wired NN has no prior, but that's false. Same as Minsky "thinks" that the room is empty when he closes his eyes.
EDIT: actually if you read the source (spetharrific.tumblr.com) it spells this out.
It is easier to formulate the problem when you are dealing with only one specific task. Transfer learning is useful, but in its current format it is limited, and can only borrow features from tasks that are similar. Meta-learning is trying to tackle this problem specifically, promising result is presented, but still long way to go to be useful in real-world production.
In order to NOT learning from scratch, we need understand the representation of knowledge. For neural networks, the knowledge is parameters that learnt. However, the parameter is tied to the task that your trained your network with, so we are back to square one. I doubt continuous differentiable tensors are the ONLY representation of human's knowledge. It might represent the hard-to-explain intuition part of it, however, a large, probably larger part of our knowledge is better represented by category/hierarchy/graph/rules that is not easily differentiable, which current deep learning techniques struggle to model.
So, IMO, it is not really that researchers are obsessed with supervised learning, it is probably that they have no better alternatives.
―Spoon Boy to Neo
category/hierarchy/graph/rules are important but I suspect only our conscious/social minds works in that abstraction.
and go back and/forth. Think of an elephant. now an Asian elephant walking by a river. your image probably was pretty abstract then got more vivid. Then re-adjusted to add the river. So you went from symbols to a more detailed model like a generative model, but you probably only have had a few examples of elephants to generate from.
they are not: the technique of transfer learning is a strong prior which is now widely used.
It's a better approach if we want to develop generalisable learning systems, no?
Edit: I haven't read Turing's paper recently and I might be misquoting him, but I do think that was his general intuition, that you can't just learn the world by observing the world.
Given that AlphaZero just happened, we now know that isn't true, no?
But the paper is cryptic to me, overall ("Mastering Chess and Shogi by Self-Play with a
General Reinforcement Learning Algorithm").
Also, the parent comment is paradox. Data is information and that is something, so learning from data is learning from something.
So the network didn't have to learn anything about the rules of chess, shogi or go and indeed, it did not. That knowledge was given to it directly. As far as I know, this is the done thing with most game-playing systems, especially ones for classic board games that have usually very simple rules that are easy to hand-code (so that there's no real need to learn them from scratch).
It would be interesting to see if it's possible to machine-learn the _rules_ of a game (i.e. given a move, recognise it as legal or not). A quick scan of internet search results confirms my recollection that most published work in game-playing agents focuses on learning how to play well, rather than how to play in the first place (indeed "learning to play chess" is used to mean "learning to win" in many publications).
data -> information -> knowledge -> wisdom
Learning from nothing != general AI.
Totally false. By every single task you probably mean "every deep learning success story" which would be tautological.
There are tons of tasks where deep learning doesn't quite work yet and you still have to hand-craft your features.
Example from my domain: Try learning a classifier for malware which uses just raw binaries on input.
Regarding articles you mentioned:
Using ROC curve for evaluation in this case is a red-flag because it doesn't take the data imbalance into account. Precision-Recall curve is way more suitable. You can have great AUC on the ROC curve but precision can be near zero in highly imbalanced problems such as malware detection. Precision is the probability that a positive detection is a true positive. Which is usually the measure you are most interested in.
The problem is that precision changes if you change the class priors. Because of that, the results are always very dataset specific.
With that said, I do not say that machine learning or neural networks do not work on this task. It just doesn't work in the end-to-end manner where you just feed raw binaries as input to some generic architecture as we can do with images in some tasks.
I wonder what examples you have in mind btw, other than self-play.
I think the first part of the sentence should be that the brain doesn't have an innate weight sharing (like is stated in the end of the sentence), not that it is not convolutional. I believe the convolutional structure is actually copied from the visual cortex (but with no weight sharing as far as we know)
It is dangerously confusing to reapply the neural-net-terminology to neuronal-nets isn't it? The weight of a kernel of biological neurons, what is that supposed to mean?
If you haven't stopped reading yet, please consider: in case, as I have to assume, you mean there is a specific ensemble of neurons that represents a kernel of given weights corresponding to exactly one area of retina, then isn't sharing between "pixels" achieved simply by the eye's jittering?
For better or worse, assume I'm the adversary in a GAN and ignore me if it doesn't make sense.
Ie these seemingly really complex models are actually biased to find simpler solutions that generalize well in a way that turns out to often work better then trying to explicitly learn a simple model.
In the article, LeCun & Manning argue this paradigm has some limitations, and I do agree. I'm thinking the field will evolve to systems becoming a combination of probabilistic logic-based engines (which represent formal causal reasoning) plus lots of deep models (which represent intuition, hypothesis generation and specialized tasks like vision).
Geoff Hinton's Coursera course goes pretty deep on the unsupervised models
Reinforcement learning probably counts too (Deep Q Learning, Policy Gradient, Actor-Critic networks might be equivalent to GANs?)