Hacker News new | past | comments | ask | show | jobs | submit login
CogVideo: Large-Scale Pretraining for Text-to-Video Generation via Transformers (github.com/thudm)
128 points by aero-glide2 on May 30, 2022 | hide | past | favorite | 32 comments



These things really are out of the bag at this point. Google and OpenAI won't release their checkpoint but it's really just a matter of time (not long, just months probably) before some random group of people with enough patience and data (you really don't need insane compute if you just have patience on the scale of a few months) releases fully working text to video models ready to use.

Exciting times.


Yup it’s not far, and the algorithms aren’t a secret. Five years down the road, adoption will explode.


The first short term application for this may be in online advertising. Cheap, short and low resolution clips that are only meant to grab attention and can be as exotic as you want, related to a product you want to sell. The advertiser cherry-picks from a pool of generated clips related to their query, so the hit and miss nature of these models won’t be a big problem.


Pornography. The first application will be pornography.

The first mover who can surreptitiously replace cam girls doing stuff like drinking milk out of dog bowls while wearing dog costumes is going to make so much money from unsuspecting rubes who think they're actually paying strangers to degrade themselves.

"A fool and his money are soon parted"


Definitely. Porn sites tend to be leaders in many ways. Seen it in UI/UX features. iirc pornhub/mindgeek had the thumbnail timeline scrub and timeline viewing hotspots before YouTube and other video sites.


And video compression by extension. As for porn choreographer can express intent in video, put it through NN to decompose into vector representation, add the girl to the blender and put back through.


Unless compute costs drop to near free, it will always be cheaper to pay some desperate and/or "liberated" woman a few USD to do such things.


That was the old porn paradigm.

I think you're unaware of just how much some people make on onlyfans. It's far more than a few dollars.

not to mention something like this will allow you to generate body types that are very rare, if not outright impossible, and you can animate them doing things that are also very rare, or outright impossible.


Key is some. It’s a Pareto distribution.

Perhaps synthetic porn will get there some day. But we’re far off from being able to do consistent scenes and the tools to create them.


Then they'll be A/B tested and possibly even without human moderation since you can filter bad content by just checking with CLIP what it contains.


If the video doesn't load, try this https://twitter.com/ak92501/status/1531017163284393987


The lion drinking water is hilarious.


Training from video might unleash new capabilities in language models. The volume of data in video format is huge, video contains different information - some common sense patterns that are rarely described in text, it might empower Gato-like RL agents to quickly understand and solve tasks in new environments. Even training from raw audio might open up language models to work for less represented languages / with little written down text, learn to interpret sentiment from tone and improve music composition.

So far we have

- multi modality - text, image, audio, video, math, code. At some point in the future brain scans will also become a cheap modality to train on.

- multi task - finetuning on hundreds of tasks at once and getting zero shot task capabilities

- multi memory - except for the limited buffer, the language models can use retrieval/search over an external corpus or knowledge base and in-batch memories. This opens up the possibility of large scale context, updating recent factual information without retraining; can also make a small LM perform as well as a huge LM without the memory

- multi environment - huge training corpus is great, but is fixed, while environments are dynamic. Agents interacting with games, chat bots, robots, REPL can explore endless scenarios interactively.



Since this comes from a Chinese university they will likely drop the code/checkpoint or at least API access also (their text->image model already has a api).


Dark green t-shirt guy (rightmost column, second from the bottom) looks like he has entirely lost his mind.


Where's the paper?



cool, but where is the code ?


i like how every frame it's a different person


We generated images from text. We generate videos from text. What's next? We generate a bug free video games or business application source code from text?

What would happen if we can translate videos into holograms that can be projected into the living room. In fact hologram virtual objects that can be felt even.

Everything is converging to the ultimate use case: porn and virtual sex.

A solution against population decline or population explosion might be to conjure up holographic sex objects. It will cost nothing to the participant, only their sperm or eggs submitted to a government affiliated contractor.

To increase population, the freemium user donates their sperm or eggs fulfilment center. To decrease population, the user notices nothing the fertile fulfillment center is tasked with rejecting or accepting constant stream of sperm and eggs.

Babies born out of this state system would literally own nothing and be happy. Their foster parents have access to all well being and health resources provided by the state. Divorce is at an all time low due to the fact that people marry solely for the purpose, as nobody needs to work for a roof or gas anymore. Every need, every high (state regulated ofc), every material need (permanent hologram tactile objects that looks indistinguishable from the real thing but fixed to a specific room or ultrasound speaker lined walls).

Humanity is chained to this Matrix like system where humans are harvested and where the participant only provides seed and in return receive everything from the state. Food, shelter, entertainment, simulation of meaningful work (euro truck simulator 2223), all is provided and you will own nothing and be happy.


The two girls kissing looks innocent, but it shows the potential for AI to create believable virtual porn. They don't give the prompt for that video for some reason...


> but

Everyone is freaking out about this. Wouldn't it just be better if we just accepted it and got on with our lives?

When anyone can generate any video of any subject doing anything at all, we just need to come to grips with that fact. It's the new normal, and there's no going back.


It's gonna be rough for the gig workforce at onlyfans


Services like that are about the "personal" (parasocial) connection.


I think we are pretty close to advances in language models, text to speech, and image generation automating the parasocial relationship. When your AI gf looks how you want, (virtually) does whatever you want, and talks to you however and whenever you want - I don't see how OnlyFans and the like will compete.


Comparative advantage and the power of boredom help here. “Hiring” people means you get things you don’t want or didn’t ask for, and that’s a good thing.


Yeah; My gf was a "specialist" in her time at a dungeon while she was paying her way through art college. There will always be 1% of people who can pay for a professional, and 1% of girls who are great at controlling their urination at those guys faces. But realistically, the other 99% of dudes will definitely settle for an intelligent blow-up doll and an AI girlfriend. This is just hijacking the hard wired behavior of testicles. The people who really get cut out of this process are the 99% of girls who are not AI bots or domiatrixes and just want to meet a dude who isn't jacked into porn 24/7.


You won’t get that from Chinese research, it’s illegal.

Also noticed their “anime” prompt doesn’t look like anime, it looks like animated copyright free clip art. I can’t read the original Chinese prompt though, maybe it makes sense there.


Is the time ripe for a porn startup?


It literally is


It's 2022 and AIs are generating videos, yet we're still unable to properly embed a video into a HTML page.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: