StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery

easton_s · on April 4, 2021

All I know is we are 2 research papers away from fully realizing Celery Man.

sillysaurusx · on April 4, 2021

For those confused: https://www.youtube.com/watch?v=11_CFKYECA0&ab_channel=fabio...

porcc · on April 4, 2021

Why not link the original? https://youtu.be/maAFcEU6atk

minimaxir · on April 4, 2021

The Colab notebooks are good ways to test this out. The optimization one can render a frame at each optimization step and render as a video, which can make for some fun interpolation: https://twitter.com/minimaxir/status/1377480997684453378

Demo of global directions: https://twitter.com/minimaxir/status/1378766961937555457

dmvaldman · on April 4, 2021

language will be the next interface to software. to get software to do something, you will simply ask it. this work is an example.

i've been documenting this theme in a twitter thread here https://twitter.com/dmvaldman/status/1358916558857269250

tyingq · on April 4, 2021

I imagine that would be pretty popular as an Instagram filter. Where people could just say remove my zits, clean up my eyebrows, etc.

Or for Zoom. The Surrogates movie comes to mind.

dannyw · on April 5, 2021

How much manual work is required in training the text model? Do I need to give examples?

jhvkjhk · on April 4, 2021

I doubt whether this kind of AI can fully understand human languages. If the answer is no, will we create a new genre of languages serving them specifically? Imagine in the near future, programmers are not eliminated by AI, instead they code with a language looks like spoken language, but it is unnatural for human, designed for AI like this.

jamesjyu · on April 4, 2021

It’s already here and called prompt engineering. See Gwern’s extensive explorations of this [1].

I’ve been building a product on GPT-3 [2] using extensive prompt engineering. It’s a bit like programming, a bit like writing. It’s kind of like giving instructions to a child, but a child with essentially infinite memory and perfect recall. Some tasks work quite easily via commanding, while others need quite a bit of massaging to get coherent results, like construction of entire fictional scenes or documents that would be found in the real world, but where you’re just looking for one paragraph of the document as the output.

I do think that as these language models mature, prompt engineering will go by the wayside. With minimal training, you’ll be able to tell the AI precisely what to do.

[1] https://www.gwern.net/GPT-3 [2] https://www.sudowrite.com/

natch · on April 4, 2021

Humans can’t even fully understand human languages.

Setting that aside, the starting point is not that they aren’t (going to be in the future) capable enough to understand us. It’s the opposite.

They will be so far ahead of us that they will have to dumb things way down for us to barely follow along with what’s happening.

Of course as is wise on HN you do carefully plant some weasel words. “This kind” of AI being the most obvious escape hatch for the defense of your argument. But I assume people are interested in the bigger picture AI, not just a narrowly defined AI like this repo only, or this approach only, or this git hash of this branch of this repo only, etc.

mr-t · on April 4, 2021

I think expecting AI to either “fully understand” human language or not is a false dichotomy.

Right now, many AI systems can receive instructions through python (which, to me, look like unnatural language but can be spoken). Systems like CLIP and systems built around the GPT models can take in massaged English language prompts and return an AI generated output based on that.

I think we will asymptotically approach having our systems “fully understand” human language but I also think we’ve already arrived at your implied future of communicating with them through an unnatural, intermediate language. Isn’t that exactly what programming is for?

gwern · on April 4, 2021

StyleCLIP doesn't necessarily show off what CLIP is capable of, much less what NNs right now are already capable. If you're interested, you should read through https://openai.com/blog/dall-e/ https://openai.com/blog/clip/

mrkramer · on April 4, 2021

"Adobe Research" This will probably get added to Adobe PhotoShop.