The goal isn't entertainment but practical simulation. I'm building tools to automate A/B testing, model marketing campaign responses, and optimize content - all by simulating human behavior at scale. Where behavior trees fall short, language models can capture nuanced user reactions that can't be reduced to simple metrics.
Why do you think modeling a bunch of LLM characters and watching their interactions is somehow going to yield a substantially better result than asking an LLM to output content specifically tailored for a particular audience?
If the answer you seek can be observed by watching LLMs interact at scale, then the answer is already within the LLM model in the first place.
I simply tested it, result are quite different tbh. Now the big question is "Why".
My first thought would be that it's kinda we, humans, behave. It feels a bit the equivalent of the mom-test. If you ask someone "do you like this", he/she is bias to to say "yes", same goes for LLMs (I took a dummy example but you got the idea).
Anyway, I can be wrong, I can be right, ATM no one knows not even me
I think what you’re just doing there is averaging out the output of many LLMs, and it gives some different results, but there’s no reason you couldn’t arrive at those same results I think by just complicating the original prompt manually.
(if I can make it realistic enough lol)