Hacker News new | past | comments | ask | show | jobs | submit login

Due to how VQGAN + CLIP works, it can't memorize its inputs in the way language models like GPT-3 do.

VQGAN does the generation work, CLIP just says if it's good not, improve the latents, repeat. Here's a good technical writeup: https://ljvmiranda921.github.io/notebook/2021/08/08/clip-vqg...

And of course, the most-common VQGAN was trained on ImageNet, which likely doesn't have every movie poster as training data. (it could be in CLIP though)




What do you suppose the mechanism is for the Charlie and the Chocolate Factory image having a golden ticket held aloft by somebody’s right hand, with a person in a purple outfit and top hat? The page says:

> a brief text description of a movie

However apart from the existence of a golden ticket, I wouldn’t expect those details to make it into a brief description of the film. And yet there’s an original poster matching those details that the VQGAN + CLIP generated image seems to draw from.


Even more convincing to me is the face of John Malkovich being on the poster of Being John Malkovich. Unless the description includes a pretty accurate description of his face (hairstyle, gender, age, facial hair, skin color), the model must have encountered his appearance in its training set.


>(hairstyle, gender, age, facial hair, skin color)

That's not enough for reconstructing the face of John Malkovich from text, you need minute facial feature parameters (eye shape, nose shape, eye-nose distances etc etc)


Because he is famous on the Internet, CLIP “knows” what John Malkovich looks like. Or, more accurately: what an image people would label “John Malkovich” feels like.


Wouldn't the most obvious explanation be a description which mentions Willy Wonka's chocolate factory, which doesn't really turn up anywhere in the training data except the original film media?

Star Wars is an interesting example because it appears to include elements lofted directly from the film (bits of stormtroopers body) alongside a princess who definitely isn't Leia. The algorithm might be creating things from scratch at a high level, but the constituent elements are pretty clearly close reproductions of parts of the source material


Would be interesting to see how well the newer CLIP guided diffusion model works. This is a collection of what it generates with the prompt 'mad max alien spacecraft landed in the desert'

https://i.imgur.com/A1sAaev.jpg




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: