There are some approaches that use an LLM to generate “scripts” (you can think of them as a DSL) for composing/arranging media, essentially driving other models to generate parts of the media. One example is WavJourney: https://audio-agi.github.io/WavJourney_demopage/