Hacker News new | past | comments | ask | show | jobs | submit login

something like a white paper with a mood board, color scheme, and concept art as the input might work. This could be sent into an LLM "expander" that increases the words and speficity. Then multiple reviews to tap things in the right direction.



I expect this kind of thing is actually how it's going to work longer term, where AI is a copilot to a human artist. The human artist does storyboarding, sketching in backdrops and character poses in keyframes, and then the AI steps in and "paints" the details over top of it, perhaps based on some pre-training about what the characters and settings are so that there's consistency throughout a given work.

The real trick is that the AI needs to be able to participate in iteration cycles, where the human can say "okay this is all mostly good, but I've circled some areas that don't look quite right and described what needs to be different about them." As far as I've played with it, current AIs aren't very good at revisiting their own work— you're basically just tweaking the original inputs and otherwise starting over from scratch each time.


We will shortly have much better tweaking tools which work not only on images and video but concepts like what aspects a character should exhibit. See for example the presentation from Shapeshift Labs.

https://www.shapeshift.ink/


And I think this realistically is going to be the shape of the tools to come in the foreseeable future.


You should see what people are building with Open Source video models like HunYuan [1] and ComfyUI + Control Nets. It blows Sora out of the water.

Check out the Banodoco Discord community [2]. These are the people pioneering steerable AI video, and it's all being built on top of open source.

[1] https://github.com/Tencent/HunyuanVideo

[2] https://banodoco.ai/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: