something like a white paper with a mood board, color scheme, and concept art as the input might work. This could be sent into an LLM "expander" that increases the words and speficity. Then multiple reviews to tap things in the right direction.
I expect this kind of thing is actually how it's going to work longer term, where AI is a copilot to a human artist. The human artist does storyboarding, sketching in backdrops and character poses in keyframes, and then the AI steps in and "paints" the details over top of it, perhaps based on some pre-training about what the characters and settings are so that there's consistency throughout a given work.
The real trick is that the AI needs to be able to participate in iteration cycles, where the human can say "okay this is all mostly good, but I've circled some areas that don't look quite right and described what needs to be different about them." As far as I've played with it, current AIs aren't very good at revisiting their own work— you're basically just tweaking the original inputs and otherwise starting over from scratch each time.
We will shortly have much better tweaking tools which work not only on images and video but concepts like what aspects a character should exhibit. See for example the presentation from Shapeshift Labs.