
Contrastive Self-Supervised Learning - ankeshanand
https://ankeshanand.com/blog/2020/01/26/contrative-self-supervised-learning.html
======
fxtentacle
They kind of slip it under the rug that for the PASCAL VOC tests, unsupervised
was only used as pre-training and then followed by supervised training before
evaluation. That's the difference between "this course will teach you Spanish"
and "this course is a good preparation to do before you start your actual
Spanish course".

Also, while it is laudable that they attempt to learn slow higher-level
features, the result of contrastive loss functions is still very much detail-
focussed, it just is so in a translationally invariant way.

A common problem for image classification is that the AI will learn to
recognize high-level fur patterns, as opposed to learning the shape of the
animal. Using contrastive loss terms like in their example will drive the
network towards having the same features vector for adjacent pixels, meaning
that the fur pattern detector needs to become translation-invariant. But the
contrastive loss term will NOT prevent the network from recognizing the fur,
rather than the shape, as is claimed in this article.

~~~
ankeshanand
Sorry if it wasn't clear, I do mention the linear classification protocol
several times in the post. If you want to evaluate performance on a
classification task, you have to show it labels during evaluation, otherwise
it's an impossible task. Note that the encoder is freezed during evaluation,
and only a linear classifier is trained on top. Now, even when evaluated on a
limited set of labels (as low as 1%), contrastive pretraining outperforms
purely supervised training by a large margin (check out Figure 1 in the Data-
Efficient CPC paper:
[https://arxiv.org/abs/1905.09272](https://arxiv.org/abs/1905.09272).

I did not get the second part unfortunately, could you elaborate more and
clarify if you are talking about a specific paper?

~~~
fxtentacle
The problem that I see with supervised training of a linear classifier after
unsupervised training is that if the unsupervised network is large enough, it
allows the supervised trainer to choose the working components. As shown in
[1] that can lead to randomly initialized networks working well, too, meaning
that this does not necessarily show that the unsupervised training produced
useful features.

I would instead suggest to train a categorization classifier unsupervised,
too, for example using mutual information loss with the correct number of
categories, as suggested in [2]. Afterwards, one can then deduct the mapping
between the categories learnt unsupervised and the groundtruth categories to
allow evaluation. That way, good results clearly prove a good unsupervised
training method.

The problem that I mean in the second part was that most networks trained for
object recognition work on low-level features such as colors and textures, as
shown in [3]. The turtle clearly has a turtle shape and arrangement and looks
overwhelmingly like a turtle to humans. But its high-frequency surface details
are those that the neural network associates with a rifle, which is why those
networks are fooled even on photos from varying perspectives.

Training a network with a loss to ensure that the local area of an image
produces features that are highly correlated to the global features of the
same image does not avoid this problem, because the high-frequency patters
that the AI erroneously uses for detection are present both in the local as
well as in the global scale. Sadly, I don't have any idea on how to improve
that either.

[1] What's Hidden in a Randomly Weighted Neural Network?
[https://arxiv.org/abs/1911.13299](https://arxiv.org/abs/1911.13299)

[2] Invariant Information Clustering for Unsupervised Image Classification and
Segmentation
[https://arxiv.org/abs/1807.06653](https://arxiv.org/abs/1807.06653)

[3] Synthesizing Robust Adversarial Examples
[https://arxiv.org/abs/1707.07397](https://arxiv.org/abs/1707.07397)

------
jph00
There's a lot to like in this article, but I don't quite agree with the setup.
I think it's better to think of "contrastive" approaches as being orthogonal
to basic self-supervised learning methods - they represent an additional piece
you can add to your loss function that results in very significant
improvements. This approach can be combined with existing self-supervised
pretext tasks.

I've discussed these ideas here, for those that are interested in learning
more:
[https://www.fast.ai/2020/01/13/self_supervised/](https://www.fast.ai/2020/01/13/self_supervised/)

~~~
jph00
BTW, one thing which makes it a bit hard to get into self-supervised learning
is that the most common benchmarking task involves pretraining on Imagenet,
which is too slow and expensive for development.

I recently created a little dataset that is specifically designed to allow for
testing out self-supervised techniques, called Image网 ("Imagewang"). I'd love
to see some folks try it out, and submit strong baselines to the leaderboard:
[https://github.com/fastai/imagenette#image%E7%BD%91](https://github.com/fastai/imagenette#image%E7%BD%91)

~~~
allovernow
I might take you up on that. How does your dataset facilitate self supervised
experimentation?

I've a good amount of experience playing with autoencoders but this is the
first I've heard of contrastive learning.

------
allovernow
Great post. For an ML engineer HN can be a goldmine sometimes! I've gotten a
bunch of ideas for work from submissions. The pace at which ML is expanding is
phenomenal. No doubt in part thanks to the open nature of arxiv. As the sum of
so many centuries of achievement, it really makes me proud to be human...and
I'm excited to watch as it changes the world.

------
bobosha
Great write up, I especially liked the section of Contrastive Predictive
Coding, I think that's going to be the next iteration of ML.

~~~
p1esk
What’s the current iteration of ML?

~~~
bobosha
Neural networks (deep learning)

~~~
backpropaganda
CPC trains neural nets.

------
pequalsnp
I hadn’t heard of this before. Cool. Going to share this with my team on
Monday.

