Due to how VQGAN + CLIP works, it can't memorize its inputs in the way language ...

JimDabell · on Sept 2, 2021

What do you suppose the mechanism is for the Charlie and the Chocolate Factory image having a golden ticket held aloft by somebody’s right hand, with a person in a purple outfit and top hat? The page says:

> a brief text description of a movie

However apart from the existence of a golden ticket, I wouldn’t expect those details to make it into a brief description of the film. And yet there’s an original poster matching those details that the VQGAN + CLIP generated image seems to draw from.

stncls · on Sept 2, 2021

Even more convincing to me is the face of John Malkovich being on the poster of Being John Malkovich. Unless the description includes a pretty accurate description of his face (hairstyle, gender, age, facial hair, skin color), the model must have encountered his appearance in its training set.

hoseja · on Sept 2, 2021

>(hairstyle, gender, age, facial hair, skin color)

That's not enough for reconstructing the face of John Malkovich from text, you need minute facial feature parameters (eye shape, nose shape, eye-nose distances etc etc)

corysama · on Sept 2, 2021

Because he is famous on the Internet, CLIP “knows” what John Malkovich looks like. Or, more accurately: what an image people would label “John Malkovich” feels like.

notahacker · on Sept 2, 2021

Wouldn't the most obvious explanation be a description which mentions Willy Wonka's chocolate factory, which doesn't really turn up anywhere in the training data except the original film media?

Star Wars is an interesting example because it appears to include elements lofted directly from the film (bits of stormtroopers body) alongside a princess who definitely isn't Leia. The algorithm might be creating things from scratch at a high level, but the constituent elements are pretty clearly close reproductions of parts of the source material

jcims · on Sept 2, 2021

Would be interesting to see how well the newer CLIP guided diffusion model works. This is a collection of what it generates with the prompt 'mad max alien spacecraft landed in the desert'

https://i.imgur.com/A1sAaev.jpg