Hacker News new | past | comments | ask | show | jobs | submit login
Visual Introduction to Self Supervised Learning (amitness.com)
219 points by amitness on Feb 28, 2020 | hide | past | favorite | 12 comments



Some really interesting work lately on "contrastive" learning, where the accuracy is really getting on par with supervised learning, e.g. https://arxiv.org/abs/2002.05709


Yup, this paper is really interesting. I'm reading it currently and will be posting an illustrated post soon on it.


Here is the illustrated post on SimCLR: https://amitness.com/2020/03/illustrated-simclr/


For those of us out of the loop, could you summarize the idea of contrastive learning as a whole?


A fully illustrated article [1]

And Lilian Weng blog on self-supervision [2]

.. CPC is .. translating a generative modeling problem to a classification problem... uses cross-entropy loss to measure how well the model can classify the “future” representation amongst a set of unrelated “negative” samples...

[1] https://ankeshanand.com/blog/2020/01/26/contrative-self-supe...

[2] https://lilianweng.github.io/lil-log/2019/11/10/self-supervi...


Thanks!


(one variant of) The task is: Given a crop of an image (or a short audio snippet etc.), can you find the matching crop that also comes from the same image from a set containing a lot of negative samples (crops of others, unrelated images)?

To succeed, the encoder needs to be able to extract the underlying, useful information (called slow features) contained in the patch and discard the noise as this will make the retrieval process much easier.

This yields an encoder that gives pretty good representations of your inputs and you can then finetune some additional layers on top of it for your final task.



So instead of image annotation, self-supervised learning performs image manipulation to train a model. Then what? Is this network then piped into the original task at hand which would have required human annotations or is it simply for these made up tasks?


You then add a few additional layers on top, and you train those new layers in a classic supervised way. But because a lot has already been learned you need way less labels.


Two main things you can do:

1) Transfer learning -- start with self supervised model and either fine tune parameters or freeze parameters + add another layer to train your task (with way fewer params/necessary labels since you already learned about the input distribution)

2) Nearest neighbor/clustering -- no need to label all classes, simply fetch similar examples (eg find semantically similar sentences in a corpus).


Optimistically if a self-supervised algorithm is capable of understanding a concept then it shouldn't need all that many examples to make it useful. Ideally you could just show it what cats look like (with just a couple of examples) and ask it to find more of them.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: