Some really interesting work lately on "contrastive" learning, where the accuracy is really getting on par with supervised learning, e.g. https://arxiv.org/abs/2002.05709
.. CPC is .. translating a generative modeling problem to a classification problem... uses cross-entropy loss to measure how well the model can classify the “future” representation amongst a set of unrelated “negative” samples...
(one variant of) The task is: Given a crop of an image (or a short audio snippet etc.), can you find the matching crop that also comes from the same image from a set containing a lot of negative samples (crops of others, unrelated images)?
To succeed, the encoder needs to be able to extract the underlying, useful information (called slow features) contained in the patch and discard the noise as this will make the retrieval process much easier.
This yields an encoder that gives pretty good representations of your inputs and you can then finetune some additional layers on top of it for your final task.
So instead of image annotation, self-supervised learning performs image manipulation to train a model. Then what? Is this network then piped into the original task at hand which would have required human annotations or is it simply for these made up tasks?
You then add a few additional layers on top, and you train those new layers in a classic supervised way.
But because a lot has already been learned you need way less labels.
1) Transfer learning -- start with self supervised model and either fine tune parameters or freeze parameters + add another layer to train your task (with way fewer params/necessary labels since you already learned about the input distribution)
2) Nearest neighbor/clustering -- no need to label all classes, simply fetch similar examples (eg find semantically similar sentences in a corpus).
Optimistically if a self-supervised algorithm is capable of understanding a concept then it shouldn't need all that many examples to make it useful. Ideally you could just show it what cats look like (with just a couple of examples) and ask it to find more of them.