Text2LIVE: Text-Driven Layered Image and Video Editing

graiz · on July 11, 2022

We're still in the first innings of this stuff. Zero-shot/CLiP work will extend to audio, music, music videos and perhaps full movies. I would love to see: "The Phantom Menace with Jar Jar Binks replaced by leonardo dicaprio using the voice of James Earl Jones"

nico · on July 12, 2022

Fascinating. This type of technology is enabling the “multi-verse” of media creation.

Almost anyone will be able to easily remix any type of content however they want.

echelon · on July 11, 2022

Take it further. I want to see "an entirely new Star Wars movie directed by the Coen brothers where between scenes of relationship dysfunction, reality is revealed to be a Matrix-like prison owned and exploited by Disney and policed by the Jedi."

ollifi · on July 12, 2022

The query infringes on rights of Mike Zoss productions, Warner bros entertainment and Walt Disney Studios and cannot be executed

cube2222 · on July 11, 2022

That's really cool, and could realistically end up being very useful as an end-user product.

I'm just waiting for the GIF keyboard that creates GIF's based on your prompt instead of searching through an existing database of them.

That will be truly next-level.

egfx · on July 11, 2022

[flagged]

moritonal · on July 11, 2022

Wtf, after waiting about 2mins for your site to load all I could tell from your product page is that I could somehow spend $1,299.00 on whatever it was.

metadat · on July 11, 2022

I also had the same experience, this feels like a huckster product plug for SEO.

The immediate redirect is a little odd as well, malware payload delivery anyone?

minimaxir · on July 11, 2022

This just reinforces my hypothesis that OpenAI's release of CLIP on 2021 was more impactful to image research than the DALL-E paper.

astrange · on July 12, 2022

Dall-E was amazing in that it showed good image generation worked at all, but nothing else actually uses its model architecture - even craiyon/dalle-mini which is popular right now is pretty different.

Although, I have access to DALL-E 2 and I have to say it's not as great as it looks. It is not productizable as is and couldn't replace an artist. It can't do text, the art styles change wildly and aren't controllable, image quality degrades with longer prompts, and most importantly their training data has everything vaguely interesting censored out of it.

Which is surprising since GPT-3 is very complete and a little too uncensored. Not sure if they think visuals are worse or they just have a lot of extra ethicists messing with it now.

mtlmtlmtlmtl · on July 12, 2022

I feel like censoring the training data as a way of attaining "safety" is a cheap fix that will obviously never extend to AGIs, which will probably have to be trained in the real world to even be AGI.

ShamelessC · on July 12, 2022

In OpenAI’s case it is less about safety and more about serving outputs from an API without causing huge PR backlash and without having to wait for _proper_ safety research to be completed to do so.

astrange · on July 12, 2022

Besides that, “safety” is a very abstract term, and when it comes to a specific model like this the problems become more prosaic things like “embarrassment”.

That could be of people you’ve made deepfakes of, but it could be OpenAI embarrassed that their outputs look bad, or their users embarrassed that they got an NSFW output for an SFW prompt.

I think the censorship is also about copyright though; it barely knows any characters while dalle mini users are generating stuff like “Shrek on trial for murder” all day.

ShamelessC · on July 12, 2022

Indeed and while I have no access to DALLE2, I’d be willing to bet it happily outputs Minecraft-style imagery and IP, as business-daddy MS has assured them it won’t be an issue.

astrange · on July 12, 2022

Well “Minecraft” isn’t a character so there’s definitely some of that in it. It knows the look of other IPs too, like various movies or games like Borderlands, it’s just that it doesn’t know many people’s names whether they’re fictional or not. So maybe “Minecraft Steve” wouldn’t work - can’t tell atm as I’m out of credits.

From earlier prompts I tried, “Minecraft” makes things look conceptually blockier rather than visually blockier, if that makes sense. So a “Minecraft mansion” will turn into a midcentury-modernist house with flat walls and roofs rather than being made of voxels.

That’s a pretty neat effect, but if you use names of video games in dalle or other image AIs they tend to start generating UI elements on top of your image and it looks bad.

ShamelessC · on July 12, 2022

This may have more to do with them filtering out training data containing names than anything else. They did the same thing for their released GLIDE checkpoint (predates DALLE2 by a couple of months).

https://replicate.com/afiaka87/pyglide

ShamelessC · on July 12, 2022

Indeed - the first evidence for this came from the DALL-E paper itself where they used CLIP to rerank outputs (as classifier-free guidance hadn’t yet been applied to fix DALL-E’s problems with noisy outputs).

zmgsabst · on July 11, 2022

If you missed it, like I did, here’s a blog post by OpenAI:

https://openai.com/blog/clip/

anigbrowl · on July 12, 2022

fire out of bear's mouth

Finally a way for the Asylum* to cut VFX techs and editors out of the loop altogether and allow producers to keep the good cocaine for themselves.

* the people who brought you the Sharknado franchise

Existenceblinks · on July 11, 2022

Looks promising. I think this would end up with some well-defined schema + some DSL.

upupandup · on July 11, 2022

how long until we can write stuff like: make everybody nude in this kpop video?

harpersealtako · on July 12, 2022

It's a bit of a freaky question (you mentioned the same example last week too, I had to double-check if you were the same person) but I think it's actually really important and we should be talking more about things like that. It's entirely feasible with the current technology we have, and we can't just ignore that fact. We're basically one rich eccentric (who can afford the server time) away from a model that can remove the clothing from an image of a person. You can't put that cat back in the bag.

Most AI is currently done by big, "serious" companies that both care about liability and bad press of being associated with that sort of thing, and also have a lot of AI-ethicist-type folks on board who care a lot about what they consider "misuse". Some AI designs try to limit the NSFW training data used in the model (e.g. DALL-E 2), while others try to fine-tune or censor results after the fact (e.g. GPT-3).

Right now the adult-oriented AI applications lag slightly behind the most cutting-edge ones, but actually make up a shockingly big percentage of the consumer base, both current and potential -- adult content is probably one of the biggest actual potential applications for AI, and there are some really fascinating ethical questions around it (e.g. ethics of AI-generated porn vs real life porn, considerations around real people, minors, other illegal content, etc.).

Generally, adult-oriented models are either hobbyist clones/finetunes of existing models, or just existing models that people have figured out ways to get to work with adult inputs. There are plenty of AI model hosting services out there that have no qualms about being used for shady or even illegal purposes, so it's difficult if not impossible to stop it from the server provider side.

We need to be thinking more about what how we want to handle that sort of thing socially/culturally/legally, because it's gonna happen whether we want it to or not.

ALittleLight · on July 12, 2022

I wonder if one of the porn companies has an ML team working on content generation.

harpersealtako · on July 12, 2022

I think pornhub has one they were saying they were making, idk if that is going anywhere.

daenz · on July 11, 2022

There is such a thing as too much information/honesty.

metadat · on July 11, 2022

Our best minds are working on this amazing near-magical new technology.. which will end up being productized into an Instagram Filter service to dynamically inject a stained glass unicorn in place of a horse in a video.

That's cool and all, but also really stupid and a pointless distraction compared to how novel the underlying mathematics and science are. This will quickly become a commodity and humans will acclimate to seeing such tricks. The content produced won't even be considered particularly impressive.

Damn. I was hoping the singularity would be better.

SequoiaHope · on July 11, 2022

I see it differently. As a robotics engineer I know the biggest impediment to robotics development is getting computers to understand the real world. The work on multimodal neurons, which see the word cake and know to associate it with images of cake, is a key stepping stone along the way to a fully functional embodied AI that can solve difficult real world problems. CLIP, DALL-E, and all these off shoots are representations of what we can pull from these efforts today. But long term this work will be incorporated in to bigger and more capable AI systems.

Just think: when I ask you “walk in to the workshop, grab a hammer and a box of nails, and meet me on the roof to help me secure some loose shingles” your mind is already imagining the path you will take to get there, what it will look like when you locate and grab the hammer and nails, and you’ve filled in that to get on the roof you have to meet me in the back yard to climb the ladder, which I never mentioned.

All these tiny details your mind can do effortlessly take huge efforts like CLIP to sort out how to make it work. And even CLIP is only text and images. There is a lot more to go from there.

A lot of people focus on DALL-E and the artifacts that come out along the way, but these are not the destination, just little stops showing the progress we are making on a much larger journey.

zitterbewegung · on July 11, 2022

The thesis "Best new minds" is not even correct. If the "Best new minds" didn't create things like social networking and instagram we wouldn't even have the data sources to build upon these new algorithms to even work. Also, without a bunch of video game makers actually pushing graphics cards we also would have the hardware to do these things. So the "best new minds" accidentally enabled more "best new minds".

naillo · on July 11, 2022

I think now is a good time (now that they're out of research zone and actually working) for a flood of new people to enter this space and try to come up with more creative ideas than that. It's an exciting time for cool ideas and cool projects if we take off our cynicism hat for a bit.

metadat · on July 11, 2022

How is this going to be accessible to the new wave of people you're imagining?

I'd be all for it! Just not clear on a plausible path for how this better future comes to pass.

naillo · on July 11, 2022

Well you will have to do some work to understand them (read papers etc). But distilled (from larger models) versions of these things are fairly capable of being computed even on the web etc. There's definitely cool low hanging fruit here (though it's not plug and play import a library). The main thing is that these have been proven to be as powerful as they are (last year it wasn't clear they'd be able to get this good), so with some effort there's definitely cool stuff to be built. I'm excited (and working on it myself).

avgcorrection · on July 11, 2022

So what?

1. Their priorities are wrong so they are not the best minds

2. If (1) is false because the best minds can have stupid priorities, then The Best Minds is not the be-all-end-all of everything

skybrian · on July 11, 2022

Yes, lowering the cost of special effects means that they're not that special and there will be lots of crap. On the other hand, it lowers the cost of filmmaking, so there should be storytellers who use this to good effect, where the special effect isn't that noticeable but serves the story?

Though, a question is whether the good storytellers can be found easily? It seems like the situation is similar in fan fiction.