
Self-Supervised Learning [pdf] - Anon84
https://project.inria.fr/paiss/files/2018/07/zisserman-self-supervised.pdf
======
l-m-z
This is from yesterday's (June the 15th) workshop on self-supervised learning
at ICML. The video of this talk can be seen here:
[https://www.facebook.com/icml.imls/videos/2030095370631729/](https://www.facebook.com/icml.imls/videos/2030095370631729/)
(not sure if the video is also available on other platforms than facebook)

~~~
hadsed
In fact I found the second and third talks more interesting than the first!
Which sadly are not included in the slides from the OP.

------
varelse
See also "The Revolution Will Not Be Supervised" from CS294-158...

[https://www.youtube.com/watch?v=PX11C5Vfo9U](https://www.youtube.com/watch?v=PX11C5Vfo9U)

~~~
picozeta
Thanks for that, this is an interesting talk.

------
planckscnst
If we can use GANs to produce believable images of a subject and we can
identify whether a video frame can exist between two others, it seems like we
can produce infinitely high framerate videos that look like they were captured
that way. This also means we can make slow motion video when we didn't record
at a high framerate. I think with that, colorization, and possibly similar
tools for audio, we might see some amazing recreations of classic
performances.

~~~
gwern
Yep, you certainly can. 'In-betweening', like superresolution, has been a GAN
thing for years now, because triplets of frames are a clean dataset but you
also care more about perceptual plausibility than pixel error. People use in-
betweening GANs to make things like 60 FPS anime. (Not entirely sure why, but
they do.)

~~~
ionionionioa
>People use in-betweening GANs to make things like 60 FPS anime

Animation seems like an especially poor fit to me, since the actual framerate
is often much lower than the video's framerate. Framerate can vary between
scenes and even within different parts of one scene! Typically the background
is very low framerate (sometimes as low as 4 FPS), the foreground is higher
framerate (typically 8-12 FPS), while pans, zooms, and 3D elements are at a
full 24 FPS. Most of the additional frames from interpolation will therefore
be _exact duplicates_ of other frames.

This does little to improve the smoothness of the video. It just adds in
artifacts. And, since the frames between two drawings will be interpolated
while frames within one drawing will be unchanged, the framerate will be
inconsistent and appear as judder.

Interpolation will never work for 2D animation. No way, no how. Any worthwhile
system will need to modify existing frames rather than simply adding more in
between the original frames. I can understand interpolation for live action
(though I still dislike it), but it is absolutely god-awful for animation.

~~~
gwern
I think that's wrong: the whole point of GANs is that they're quite
intelligent and good at faking outputs. I've seen interpolated/in-betweened
videos (mostly but not entirely live-action), and it looks realistic to me.

The reason I'm somewhat skeptical is that just because something looks
realistic doesn't mean that it what is _intended_. It's a version of the 'zoom
in, enhance, enhance' problem. It's like the _Hobbit_ problem: a GAN could
perfectly well fake a 60FPS version of a 30FPS version of the _Hobbit_ such
that you couldn't tell that it wasn't the actual 60FPS version that Peter
Jackson shot... but the problem is that it's 60FPS and that just feels wrong
for cinema. Animators, anime included, use the limitations of framerate and
deliberate switches between animating 'on twos' etc, with reductions in
framerates being deliberately done for action segments and sakuga and other
reasons. An anime isn't simply a film which was unavoidably shot with a too-
low framerate.

(This is less true of superresolution: in most cases, if an anime studio could
have afforded to animate at a higher resolution originally, they would have;
and you're not compromising any 'artistic vision' if you use a GAN to do a
good upscaling job instead of a lousy bilinear upscale built into your video
player.)

~~~
ionionionioa
>interpolated/in-betweened

That's the problem: no matter how smart your algorithm is, you cannot make
animation look smooth by only adding frames. Not even human animators could do
that.

The framerate of animation is irrelevant. What matters is the number of
_drawings_ per second, not the number of frames. An intelligent system would
interpolate between drawings, which would often require modifying or deleting
frames from the source.

I'm not some purist claiming that this is an evil technology. It just plain
doesn't apply to animation, except for pans or the rare scene animated at a
full 24 FPS.

~~~
gwern
I'm not following. (If it doesn't apply at all, how is anyone doing it...?) Of
course you can identify drawings per second, much the same way a monitor can
display a 24FPS video at 120hz without needing to be an 'intelligent system':
you increase or decrease the number of duplicates as necessary. You in-between
pairs of different frames, replacing all the identical ones which are simply
displaying the same drawing.

------
p1esk
Yann Lecun is also a fan of self-supervised learning:
[https://www.facebook.com/epflcampus/videos/yann-lecun-
self-s...](https://www.facebook.com/epflcampus/videos/yann-lecun-self-
supervised-learning-could-machines-learn-likes-humans/1960325127394608/)

------
p1esk
So, is there any difference between "self-supervised" and "unsupervised"
learning?

~~~
jokoon
That reminds me of those neural network models that would learn to change
themselves, a little like a ML algorithm that would learn about neural network
that worked best.

I think google did something like this some years ago?

~~~
neruotablet
You mean Neuroevolution? I.e using Evolutionary Algorithms [https://youtu.be/L
--IxUH4fac](https://youtu.be/L--IxUH4fac) to evolve NNs?

~~~
nl
No. The Google technique was reinforcement learning based and didn't use
evolutionary algorithms at all.

[https://ai.google/research/pubs/pub45826](https://ai.google/research/pubs/pub45826)

------
shgidi
A post series that also summarizes this subject:
[https://link.medium.com/IhOvrqFEzX](https://link.medium.com/IhOvrqFEzX)

------
hbarka
I’m nowhere near this field, but for the experts in here I was wondering if
Self-Supervised Learning changes the paradigm for the approach to self-driving
cars?

------
ec109685
At what point are we able to deliver low resolution video and have the system
make up on the fly believable high resolution version of it?

~~~
felipellrocha
ENHANCE!

~~~
derefr
Nah, the CSI "enhance" thing is "multi-frame super-resolution image recovery",
a different (though related) ML technique.

Speaking of, though: you'd think, by now, that security cameras that capture
footage at very low framerates for the sake of storage space, would have ASICS
in them using those models to convolve together a bunch of grainy input frames
into a stream of fewer, but very _good and clean_ frames.

Any hardware on the market with this capability yet?

~~~
ClassyJacket
It makes sense for entertainment but not for security cameras - then you're
filling it in with made up information. A security camera is supposed to be a
record of truth.

~~~
oneshot908
Imagine a world where low information sorts interpret a sampling of possible
hi-res reconstructions from low-res security videos as ground truth. That to
me is far scarier than the OpenAI and MIRI fear-mongering about GPT-2.

------
Blozvez
For me, this kind of education always seemed to be amazingly tough. I mean, I
always prefered to hire tutors, I live in la
[https://worldpostalcode.com/united-states/california/los-
ang...](https://worldpostalcode.com/united-states/california/los-angeles) and
here it's not that hard to find a good one.

------
amthewiz
The DNN based techniques are new but the concept of models that can fill in
the blanks is old. It used to be called content addressable memory or
autoassociative memory.

------
person_of_color
Is this the game changer we are looking for?

~~~
p1esk
What are you looking for?

------
suyash
Mind blowing progress in the field of self supervised learning.

