
Improving YouTube video thumbnails with deep neural nets - jplevine
http://youtube-eng.blogspot.com/2015/10/improving-youtube-video-thumbnails-with_8.html
======
Scaevolus
I wonder if they included sexy images in their negative training sets -- many
videos accrue millions of views (and ad dollars) by having a few frames of
cleavage interspersed with other (often derivative) footage.

It would be great if their algorithm picked a thumbnail that reflected the
_entire_ video, not just a few frames specifically chosen to game people's
compulsive clicking.

~~~
nacs
Most of those are just manually selected thumbnails by the uploader. After
uploading, YT gives you 3-5 thumbnails you can choose from.

Also, partnered accounts are allowed to upload custom thumbnails (which can be
any image, not necessarily even a screenshot from the video).

~~~
JoshTriplett
> After uploading, YT gives you 3-5 thumbnails you can choose from.

Can you pick an arbitrary video frame, or only one of the suggested
thumbnails?

~~~
nacs
It automatically captures 3 different thumbnails (I guess using the algorithm
in OP) and lets you select any 1 of those 3.

~~~
lumpypua
I presume they use the image selection as training data too—if not that seems
like awfully low hanging data fruit.

------
Animats
It looks like they prefer images with a few large faces near the center of the
frame. That's probably the right answer for social media. (Plus a cat
recognizer.) Used on news footage, you probably get the talking head rather
than the news event.

~~~
anjc
We can't guess as to how the NN is preferring images, but it looks to me like
it's preferring images with a high entropy in certain regions

------
trjordan
There's an outside company that was working on this: Neon Labs
([https://www.neon-lab.com/](https://www.neon-lab.com/)).

Their insight is that not only are there images that are "high-quality", but
also images that are positive. Positive images get more clicks, over just a
decent image. I wonder if that information is encoded in the RNN in some way.

(This is where I'd normally rant about RNNs and other ML techniques hiding
this information from their creators by locking it up inside the black box,
but I'll save that for another day.)

------
mutagen
They've got to be training on more inputs than mentioned. For example, is one
or a close set of times in the video linked externally and generating traffic?
Grab the entire set of frames from that time period and run it through the
quality classifier, there might be iconic frames from that section that people
are looking for.

Are people re-watching a small segment of the video? Try classifying
individual frames from that segment or just before. Of course, those are often
action moments that result in smeared motion and artifacts and may not result
in a quality thumbnail.

These ideas also only come into play when a video has been live for a while,
after the uploader has initially picked a thumbnail. Maybe a "We have some new
thumbnail suggestions for you, take a look" alert or message?

------
needBigrPics
So, in an article about image processing, why not include nice big beautiful
images, that get even bigger when you click on them?

I click on the low detail inline images, and they stay the same disappointing
size and reveal no further detail.

They're all, like 600px X 200px? Am I being greedy for want of gigantic
images, upwards of 3000px wide?

I suppose it _is_ an article about thumbnails, after all, so maybe I shouldn't
be so surprised.

------
Nyetan
Seeing this run through an equivalent of the deep dream visualizer could be
really interesting -- what _are_ people looking for in thumbnails? I'm having
difficulty imagining what features would even be relevant in such a situation.

~~~
kylebgorman
I'm guessing: "sharpness" of image, good saturation, presence of (smiling?)
human faces, non-human mammals facing the camera, bare human skin (?)

(I agree that'd be cool.)

------
mdpm
Meanwhile, I still can't edit a playlist while playing it.

edit: constructively put - there's simpler stuff to fix UX and match user
patterns still isn't there?

~~~
GhotiFish
When you have a big system, the most consistent argument against working on
one thing is that you should be working on something else, this is true for
everything in the system, because everyone has a different opinion on what
that thing is.

For example: why should you spend time working on the playlist playback when
youtube could instead spend time working on automatic categorization, content
creators have to manually create playlists, even if they sequentially number
their videos. Youtube shouldn't waste their time on playlist editing when it
could be doing the right thing automatically.

