Hacker News new | past | comments | ask | show | jobs | submit login
SHOW-1 and Showrunner Agents in Multi-Agent Simulations (fablestudio.github.io)
182 points by boffbowsh on July 19, 2023 | hide | past | favorite | 49 comments



The interesting thing here is that Matt and Trey have codified intentionality as a primary writing tool, and whatever this is, doesn't mention that once. That's a pretty major thing to overlook, like a blind spot or achilles' heel of the AI devs.

In fact, this is a blind spot of ALL AI that I've yet seen.

Matt and Trey approach writing scenes with the following intent: if a scene can be described as following up a previous scene with 'AND THEN', it fails.

If the following scene can be described as 'BUT' or 'THEREFORE', they write it and it goes in the show.

This is so simple, yet it's completely absent from what AI is doing, and seemed absent from the example script (I'd have liked a written version of their fake South Park script, as it was pretty insufferable). I'm not sure what beyond expanding context length will be needed, context length is merely a technical problem… but this is a significant failing in what we get out of AI.

This is where the sea of meaninglessness comes from. Good creators have a great deal of context and intentionality in their work, and for all that Matt and Trey are puerile and crass, they've got a clearly expressed agenda artistically (and philosophically).

There's also human creators who are very facile, but very prone to 'AND THEN': AI will have a much easier time replacing humans like this, than replacing Matt and Trey.


Excerpt of them discussing the but/therefore rule: https://www.youtube.com/watch?v=vGUNqq3jVLg


Beyond not being similar to Matt and Trey, I've started being able to recognize ChatGPTs writing style. It loves re-iterating a portion of what was said by it or someone else - "yes, AI could be a tool to ..." Which makes for awful entertainment writing.

The joke about Trump was pretty good, though.


I’m not entirely sure I get it. You can literally prompt the LLM with “therefore” or “but,” and it’ll continue. You simply take the text generated prior and append “therefore,” and it’ll keep generating. If you want to lend more intentionality you can frame it with “therefore <insert intention>” and it’ll continue generating with that intention.

I’ve been up all night flying internationally but I might have misunderstood.


No, not really. The body of text GPT is trained on gets its own interpretation of 'but' and 'therefore'. If you feed it 'therefore' maybe it'll start writing legal documents. If you feed it 'but' maybe it'll just contradict itself, or express some triviality.

What you'd need for Matt and Trey style 'but' and 'therefore' is entire prompts being introduced in the background and switched out. Imagine thousands of words of prompt. 'but' means, fundamentally obstruct something about where your whole prompt is heading, like a screenwriter introducing a twist that must be resolved before the story can continue. 'therefore' means describe something that unfolds obviously as a result of all that's been introduced in the prompt.

These are not sentence-level issues, not output-level stuff. These are prompt level. More than that, they're prompt level with intentionality: you have to understand how a 'but' will fundamentally obstruct your prompt, how a 'therefore' will integrate both your original prompt and the obstruction.

Assume you have to coherently switch around your prompt introducing new fundamentals, and still have that make sense. It might get you rather formulaic results (but, Luke loses his mentor! therefore he must study and meditate and get to the final goal through his own transcendence!) but that just shows you it's working. That's how you get from a pile of arbitrary time-wasting, to a capital S Story.

From there on out, it's about which stories to tell, how far you can depart from the norms while still providing the intentionality and purpose, and what the purpose is :)


I think that’s sort of unfair and untrue, have you used GPT? Yes if you naively use “therefore” you might. But it explicitly semantically matches the context. If the context is a south park story and with prompting in the context of how to respond to trailing therefores it will almost certainly follow the semantic context prompted with. Now - is it as good as Matt and Trey? Of course not! It’ll produce a relatively bland imitation.


My understanding from paper is every character had separate prompt, and somehow they mixed it together.

'BUT' approach would require opposite, start with story, somehow integrate character quirks.


AI Mett Porker (I see what you did there)

Maybe in my lifetime an AI could generate a few more seasons of Firefly (that got cancelled too soon)?

Another AI challenge would be to super-impose Harrison Ford onto Alec Baldwin in the movie Hunt for Red October. That would give nice continuity to the Jack Ryan character across the Tom Clancy movie series (Patriot Games, Clear and Present Danger)


> Maybe in my lifetime an AI could generate a few more seasons of Firefly

You can’t take the Skynet from me


I just want the ability to stop a show the moment it does something terrible, tell the AI - Not that - and have the show be fixed.


I thought Alec did an amazing job as JR in HFRO.


Or you could replace both Alex Baldwin and Harrison Ford with Ben Affleck, since he was Jack Ryan in The Sum of All Fears.


I skimmed the paper and didn't see an answer to this: how much of the video did the AI actually generate? how much of it was touched up by humans and how much of it was actually drawn/animated solely by humans?


An engadget story demonstrates their simulation tool and discusses their "Simulation" venture a little bit more. It does look like the video is animated by AI using pre-generated scenes and characters. https://www.engadget.com/the-simulation-ai-put-me-in-a-south...


Based on my understanding after skimming through the paper (and assistance with Claude), the AI did not directly generate any full video content for a South Park episode. It seems like this was how the AI was used:

- Custom diffusion models were trained on South Park character and background image datasets. These models could then generate new South Park-style characters and backgrounds.

- GPT-4 was used to generate dialogue for scenes, based on prompts about the overall episode premise and plot points.

- An "AI camera system" was mentioned for scene setup, but details were not provided on how much of the camera work it handled. Voice cloning was used to generate audio clips of the dialogue.

Note: this is just a skim of the paper, entirely possible I and Claude may have have missed something.


Judging by the rendered samples on the page, I'm gonna assume only the assets (backgrounds and characters) were AI generated. Animations, text, and camera movement were all done manually.


So not even the script was written by the AI?


So in addition to worrying about fake AI stuff that is presented as real, we now also have real stuff presented as fake AI.


I remember someone a while back tried to pass off a script as being A.I. generated when it clearly wasn't and I thought to myself, will we need to develop a reverse Turing test? Of course some one already thought of that;

https://en.wikipedia.org/wiki/Reverse_Turing_test


I don't think I've ever seen a "AI wrote this!" script that wasn't very clearly heavily assisted or often outright created entirely by humans.


To be fair, we've always had startups pretending they have "AI" or "machine learning" but it was just mechanical turk or someone on upwork.


That's true. And the term mechanical turk by itself refers to an even (much) older 'fake' AI!


Ten Finger Automation or Wizard of Oz are the terms my entrepreneur/investor group uses.


pg used "man behind the curtain" (riff on Oz) for our real estate startup :)


Why’s that a worry?


I assume you agree with the worry about the first part (fake pretending to be real). For the second part, I left out the word, and was more thinking about 'wondering', but I agree the sentence structure suggests 'worry'.

I agree it is not a worry, at least not of the same kind. I do worry a bit though about assuming something and reasoning/discussing with that premise, which is nullified if the premise was wrong.


red green refactor. If it passes the test, the test is not good enough


Was the intro generated as well, or lifted directly from the real show?

There is a significant drop in believability as soon as the first scene of the generated show starts after the intro. Look at the first few seconds of the first scene. Here, the three boys are just standing statically in the hall while they are talking.

Now take South Park Episode 1, Season 1, from 1997.

https://www.southparkstudios.com/episodes/940f8z/south-park-...

Look at the first few seconds of that opening scene after the intro in this very first episode of the real show. Notice that even though there is a lot of time where some of the characters don’t move at all, they still feel a lot more alive than what you saw in that video that is in the OP. And I think part of it is also the use of exaggerated expressions when they do move. The others not moving is ok, because the ones that do move provide a lot of entertainment with those movements and expressions. And of course the voice acting is very passionate as well.

In the years since then the South Park episodes have gotten even better and better animation.

Maybe I am being too harsh, but the lacklustre animation in the first few seconds of the first scene of the OP video made me not watch any more of the video.

If you want to take on South Park, and you want to beat it at its own game, the bar is very high. Both in terms of script, voice acting, and animation.


I also found the intro to be of very high quality, I wonder if they overfitted on that?


https://youtu.be/piwB5SJqZ7U

Over-fitted or copied.


“The marvel is not that the bear dances well, but that the bear dances at all.”


I've got some 'AI' actors endlessly repeating a scripted 'show', part experiment, part entertainment, all weird, it runs on gtp4all so I can throw the same script at different models and see the variance and performance on a consistent creative task, you can interact with the story via chat, it 'remembers' previous interactions, sort of, interactions won't affect the outcome, that's scripted, but it will change what the actors talk about in each scene, it all runs off a simple yaml script that's meant to be 'easy' to edit. It runs TTS using either ElevenLabs for £££ or Coquai for free, and Bark when I get a GPU, Stability AI creates accompanying images, although they don't sync with the audio, they might at some point...as admin I can mess with max token length and some other settings without restarting which is nice, it runs pretty much 24/7 on a £250 pc, it's designed to run at four or five times the speed you see the 'dev' channel: https://twitch.tv/m88t

any questions you can drop them in the twitch chat... or here...


Assuming I'm reading the paper correctly this wasn't AI Generated so much as it was AI Assisted.

They used AI to generate character dialog and voices (easily the worst parts of the video) while the humans guided the plot and picked the lines of dialog that would be used.


My guess is that they could have produced the same thing in roughly the same time without AI. Possibly with better results. Writing individual lines really is not the bottleneck.


I wonder how I would have appreciate this if I didn't know it was AI generated and thought it was some fan-made thing.


Definitely could be something the South Park guys made as a joke.


But not as wordy, it was painful.


the south park guys are funny tho


Weird watching south park with zero humor in it.

Its giving 'pokemon go to the polls' energy - people using a random pop culture reference to flaccidly try to promote their ideological preference.

I hated it, and was only impressed until I realized how much manual human labor went into its construction. The editorial influence of the humans involved was far too heavy handed. I would have preferred something alien, weird and incoherent.

Here's an example of what AI can actually do - full synthesis of script, visuals and audio. It is incoherent but also strangely unsettling: https://www.tiktok.com/@never_ever_never_land/video/72531490...


I wonder how Trey Parker and Matt Stone would react to this?



I believe this sort of system is one of the main reasons why WGA and SAG-AFTRA are striking. Even though this seeks to "augment" the writing process, it ultimately leads to not needing a writer at all. It also just requires an actor provide enough audio to generate a believable voice. All of the limitations to fully automated episodic generation are essentially temporary.


Writers Guild has nothing to worry about


At least not in the next two years.


The irony of the scene of the writer's guild protest commentary


It would have been an opportunity to honor Sponge Bob. Not that I've watched it, but my nephews did and over the years I've read almost only positive things about it.



I do enjoy reading actual dialog coming from Kenny. :)


[deleted]




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: