I think the issue with LLM based agent behavior is that it ends up being limited by the need of hand crafted functions that the NPCs use to operate. And in the end those actual actions and interactions with the visible game world - tangible game state - is what the player cares about the most. There's only so many ways that an NPC can say a thing that ends up linking to the same action taken.
So, if you want the NPCs to be able to take over a town hall after the mayor has went off the rails (or some equally unexpected event that makes all the possible planning worthwhile), you'll still need a system that keeps track of it all: Who the mayor is, that there is a mayor, there is a town hall, that you can perform a coup in this specific world, the specific ways in which the NPCs can participate in the coup, etc. If you generate all of this completely procedurally, you'll end up with the NPCs falling out of sync with each other when you have a bunch of them going about their days.
After you've created a sensible database for keeping track of the current world state, character states, character relations, possible character actions, character needs, etc, doesn't that end up adding to something that would work almost equally as well without the LLM? Although LLMs are extremely powerful tool for free text input into that system on the player's end, even if you're just getting the embeddings of that text and hooking that up to a cosine similarity search of pregenerated NPC responses.
The reason LLM exists at all is that there is a big corpus of text that have the same standard rules with minimum deviation , a limited dictionary in comparison and a even more limited sets of concepts and words that are generally used inside a given timeframe/domain.
No one has bothered writing a formal description of day to day interactions inside a small town.
LLMs can describe day to day interactions in a small town just fine. They can deliver accurate text around stuff no one has likely ever bothered to write down. For example, I gave it a list of random objects and asked which ones would need to be treated delicately by a robotic hand. A cotton ball, an apple, a rock, a puddle of water, etc. It responded to each item accurately, though I doubt anyone has ever written that a cotton ball doesn't need a gentle touch from a robotic hand
Without using a AI, I can say with certainty there is a text where “delicately” is applied to a “cotton ball” related to the “handling” concept. I’ve just asked ChatGPT about a child’s day in an african village and the result is something taken from a fairy tale with an african spin. Leaving an LLM in charge of that aspect in a game, would probably lead to the hand problem we have with image generators.
So, if you want the NPCs to be able to take over a town hall after the mayor has went off the rails (or some equally unexpected event that makes all the possible planning worthwhile), you'll still need a system that keeps track of it all: Who the mayor is, that there is a mayor, there is a town hall, that you can perform a coup in this specific world, the specific ways in which the NPCs can participate in the coup, etc. If you generate all of this completely procedurally, you'll end up with the NPCs falling out of sync with each other when you have a bunch of them going about their days.
After you've created a sensible database for keeping track of the current world state, character states, character relations, possible character actions, character needs, etc, doesn't that end up adding to something that would work almost equally as well without the LLM? Although LLMs are extremely powerful tool for free text input into that system on the player's end, even if you're just getting the embeddings of that text and hooking that up to a cosine similarity search of pregenerated NPC responses.