Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: AI-town, run your own custom AI world SIM with JavaScript (github.com/a16z-infra)
429 points by ykhli 9 months ago | hide | past | favorite | 115 comments
Hi HN community! We want to share AI-town, a deployable starter kit for building and customizing your own version of AI simulation - a virtual town where AI characters live, chat and socialize.

Inspired by great work from the Stanford Generative Agent paper (https://arxiv.org/abs/2304.03442).

A few features: - Includes a convex.dev backed server-side game engine that handles global state - Multiplayer ready. Deployment ready - 100% Typescript - Easily customizable. You can fork it, change character memories, add new sprites/tiles and you have a custom AI simulation

The goal is to democratize building your own simulation environment with AI agents. Would love to see the community build more complex interactions on top of this. Let us know what you think!

Demo: https://www.convex.dev/ai-town

I made a world Cat Town to demonstrate how to customize AI town. Using C(h)atGPT :)

Demo: https://cat-town.fly.dev/ Code: https://github.com/ykhli/cat-town

If you haven't yet checked out the Generative Agents project referenced by OP, definitely give it a look, it's open source: https://github.com/joonspk-research/generative_agents

Over the weekend Lance Martin got it working with local models using llama.cpp and ollama.ai which saves $ on longer sims since all inference happens locally https://twitter.com/RLanceMartin/status/1690829179615657985. It's neat how the AI agents interface with each other – e.g. one will host a party and invites will be sent throughout the group

we've been waiting a long time :)

would be really fun to hook this up to Stardew Valley!

I wonder how llama affects the simulation. Even llama2 consistently fails simple reasoning tasks.

(Though I suppose people aren't very reasonable, so it might give an accurate simulation!)

This is awesome!

Game idea to build on top of this: Table top deception type game where each agent has the goal of convincing the real users that they are in fact also real users.(So each agent is trying to pass a turing test).

Every AI agent uses RL to optimally prompt their personal LLM for how they should chat with the human players. eg should they try to frame a certain person, should they play it dumb, should they gaslight etc.

I think it may be even more fun the other way, players need to rat out other players, do they have to pretend to be an AI.

It makes it easier for the AI mods to do the part and it puts the burden on the players.

I like your idea of find the human. Just building on that idea a little. I know current AI detection programs don't work well. But they would be fun in the context of a game. Call it "Only Robots Allowed" and have it be a single player version of Among Us. Pretend to be a robot while trying to sabotage robot things. AI detection is applied to your conversation with other robots. And also applied to your movements. If you fail the AI detection by not emulating an AI well enough, then it's time for "kill all humans!"

This is probably the most important life skill we should be teaching in schools </pessimism>.

Yep. "Find the AI" is broken for now, because whenever I've played it, humans can just be exceptionally rude/lewd or use super-modern slang. Making the humans try to blend in as AI is much more interesting, as a game.

A game like Press The Button, except there’s half AIs and half human players. The goal for each is to identify the others and airlock them off the ship. Constrain the tests in such a way that open chat using lewd language or whatever is impossible. I’d play that once or twice.

Where do you play that?

It would be pretty easy to beat. One sure way to tell an LLM apart from a human is to type something nonsensical and optionally repeat it multiple times. A human would inevitably answer with something along the lines of "what the hell are you on about dude???", which is something you'd never see from an LLM.

So you make it learn. It collects everything humans say and matches it to the situation, replaying human conversation as needed.

> which is something you'd never see from an LLM.

Might not see it from ChatGPT - but "never from an LLM"? Why would you think that?

With the current state, most powerful LLMs are also limited in the amount of topics they are allowed to discuss. I bet you could easily differentiate between player and AI by asking opinions on some controversial topic.

So like Battlestar Gallactica. Who is the cylon? The cylon might not even know they are not human until revealed to them later.

This is a very interesting use case of the Generative Agents project.

I feel like this is just one of the earlier attempts at creating an interactive experience with this, but there's still so much potential for future games, if the technology is refined further. Think of something like Animal Crossing (or any other game where interacting with NPCs is at the forefront of gameplay), but with the characters that have completely unique personalities and an ability to generate new dialogue (rather than relying on a limited number of pre-written lines), or react to player actions in unique ways, and so on.

I feel like this might be the beginning of a much richer single player game genre that mixes generative language, image, texture, and other features with procedural rogue like world generation with the semantic features defined by the LLM. Obviously NPCs, dialogues, etc, can all be context aware agents bootstrapped by the environment and player history. Content creators IMO will be creating assets that are used to fine tune and create LoRas as future game assets. Instead of monotonously creating every asset in a game, they would create styles and control nets, language corpuses, and human written lore RAG’ing into game context, etc. I think artists will have a wonderful contribution much richer than simply painting a picture.

Which ultimately ends with thinking as a generative prompt into thoughts which get passed to the universal rendering engine known as reality.

I built something very similar when the paper came out- watch out for the instances where GPT returns both sides of the conversation (e.g. interpreting a prompt with chat history as being representative of what the output should look like, and putting out both "Jack" and "Jill"'s text). The first sample I went to suffered that problem (output from "Lucky's perspective below.)

"Hey Stella! It's so great to see you! You'll never believe what I've been up to recently. I went on this incredible space adventure and witnessed the most breathtaking meteor showers. And guess what? I even encountered alien beings who communicate through colors! It was absolutely mesmerizing. Stella: Wow, Lucky! That sounds absolutely fascinating. I've always been intrigued by space and the wonders it holds. I would love to hear more about your adventure. Lucky: I'm thrilled that you're interested, Stella! I can't wait to share all the details with you. But before we dive into that, I heard you're passionate about gardening. Is that right? Stella: Yes, that's true! Gardening is one of my favorite hobbies. I find it so peaceful and fulfilling. Lucky: That's fantastic! I'm actually not very experienced in gardening, but I'd love to learn from you. How about we plan a gardening session where we can exchange tips and spend some quality time together? Stella: That sounds wonderful, Lucky! I'd be more than happy to share my gardening knowledge with you. Let's find a time that works for both of us. Lucky: Absolutely! I'm really looking forward to it. But hey, no rush. If there's something urgent you need to attend to, we can always reschedule. Just promise me we'll find another time to connect. Stella: Don't worry, Lucky"

what a descent from loving nature down to "heyyy scheduling is hard"

I did that but the trick to fix that was to use the other person's name and : prefix as the stop word. So in your case "Stella:". Then if it tries to continue the other side conversation it would just stop.

I also noticed this when browsing the demo village, some conversations it ran through and did both parts.

This ended up being my solution as well- although I still ran into edge cases where GPT would output things like "Stella said". Never did manage to 100% fix this issue, even with creative prompts.

Did you have any luck mitigating this problem?

Sure, to force chat gpt to role play I've found that giving it a character to play and then prefixing your query with underscores for meta-roleplay stuff works well.

Ie if the LLM is playing a character "Jack" and you are playing the character "James" your query might be "_only reply for your character Jack_ James: I pick up the sword and then turn towards Jack".

It can also be used to influence behaviour as LLMs often get stuck repeating (not word for word) the same events/description, ie greeting your character over and over again and not moving on, LLMs aren't great at having autonomy/agency in the flow of a conversation, I think this is best done by not providing the entire history of the conversation but instead distilling it to relevant information for the current query.

But that can also be mitigated manually by bumping their character with underscores ie "_Jack asks what James wants to order_ James: I return Jack's greeting and peruse the Tavern's menu board".

I've run into this issue a lot with ChatGPT, and almost never with GPT-4. I know it isn't always possible, but just using GPT-4 prevents this 99% of the time (basically 100% with proper prompting).

You make the other side of the conversation the stopword. i.e. "Jill:" for Jack and Jill.

It's relatively simple to detect this type of defect and handle it during/after generation.

A friend and I recently started a game studio that was largely inspired by this paper. This is an amazing foundation and I'm excited to build some more complex strategy games on top of it.

If anyone else is doing similar work applying these concepts to consumer gaming, I'd love to hear about what you're doing (dru[at]chromagolem.com)!

The AI-town stuff is cool, but the real benefit is how all the backing frameworks are already integrated. I'll definitely be using this as a jumping-off point for my next LLM project.

TIL a16z has a GitHub repo with a bunch of cool stuff.

Could we not be living in a more advanced version of this same project?

Haha yes the most commonly used argument is (afaik) that if we could create a 'complete' virtual world, we would, and projects like this and how enthusiastic we are about it suggest that that's indeed the case.

(The argument then of course continues that if we would, our world's inhabitants would some day too, ad infinitum. Given such a stack, the chance that we're actually still in the top layer is very small.)

Here's a fun little short story that touches on the idea on not being the top layer of the sim.


Nice story! A little depressing, but good.

There is something nice in the sense that simulation appears to be completely unmonitored. If my Sims started producing scans that approximated the hardware of my Gaming PC, I would probably stop the sim... assuming I was closely monitoring.


Out of fear that they would come for me. I was not a kind God to these sims (under the impression they were toys), and their behaviours have shown me that though forgiveness is one of their capabilities, they do not wield it often.

How do they show forgiveness and how were you not a kind God? And what makes you think they are not toys?

My counterargument to this is that people simulating our world would be simulating all the bad stuff in it, which would be cruel and unethical. I think if humans were simulating us, they'd simulate a more utopian world. And if curious-but-indifferent aliens/robots simulated us, they wouldn't be wasting compute on simulating all the boring and uninteresting things we have in our world.

I just don't see why would an intelligent being (especially if they're a human descendant) choose to simulate our world the way it is right now.

Whoever is simulating us might be so far ahead of us that they simply don't see or care about what is bad and cruel and tragic. It's like if we would simulate an antstack or a beehive. We would think it interesting but wouldn't really care about the ethics of ant-brawls.

Assuming that our lived reality truly is just a simulation, I think it's important to also wonder why we would be being simulated. Not that we could ever hope to know or even comprehend our creator's intentions, but perhaps it's for research?

Could it be possible that they've just rented an RTX 40,090 and are simulating the entirety of our Universe to learn about how their (our?) species developed? Or how certain things would change if variables were altered? Perhaps there is a "multiverse" of infinite selves, but the multiverse is just a all the previous simulations that were tweaked slightly differently and had a different result.

Maybe changing the starting position of 1 atom in the beginning of the Universe has a butterfly effect that would entirely evade our assumptions. If we had the compute, why not simulate every possible reality? Perhaps there is 1 simulation that's occurred where everything was perfect, without anything cruel or unethical! And if we're gonna go as sci-fi as simulation theory, I don't think it's a far assumption that we could build things like fusion reactors or Dyson Spheres. We could learn so much, especially if the agents behaved naturally and had no idea that they themselves were simulated.

I don't actively believe we live in a simulation, but I think actively believing that we don't just doesn't make mathematical sense.

This is just theodicy with extra steps!

I'm reminded of the Culture novel Surface Detail, which basically revolves around d the ethics surrounding simulating consciousness. There is a society in the book which runs afterlife simulations, including a pretty horrible hell. Conflict over whether this should be allowed to continue is one of the main drivers of the plot.

My counter argument to that is try designing a more fair universe given all the known “natural laws” to exist. It’s pretty balanced from a “finely tuned perspective” to the extent everything exactly depends on the other.

There's a fairly direct analogy to be made between this project and Averroes' idea of the universal intellect, which has roots in Aristotle.


> Averroes argues, as put by the historian of philosophy Peter Adamson, that "there is only one, single human capacity for human knowledge".[3] He calls it—using contemporary terminology—the "material intellect", which is one and the same for all human beings.[4] The intellect is eternal and continuously thinking about all that can be thought.[5] It uses faculties (e.g. the brain) of individual humans as a basis for its thinking process.[5] The process that happens in the human brain is called fikr by Averroes (known as cogitatio in Latin, often translated to "cogitation" in English), a process which contains not universal knowledge but "active consideration of particular things" that the person has encountered.[5] This use of human faculty explains why thinking can be an individual experience: if at one point the universal intellect is using one's brain to think about an object of thought, then that person is also experiencing the thinking.[5]

edit: I'm going to elaborate on this. In particular it's interesting to consider two kinds of knowledge -- the knowledge stored in ChatGPT's neural network, which is broad and "universal" and "timeless", and the knowledge of the individual agent's context, which is particular and local in both place and time. The "intellect" can't really do anything without being provided with a particular context in which to act.

Aren't we?

This just reminded me of Sam Harris’s arguments regarding free will not existing and certain thoughts that seemingly originate from nowhere. I guess an LLM is injecting them!

The Bible begins with a declaration of recursion stating it’s a LLM.

In the beginning was the Word, and the Word was with God, and the Word was God.

That's John 1:1 in the New Testament.

The Bible begins in Genesis with a one week period of entropy reduction. "In the beginning God created the heavens and the earth", a summary of the following three or four chapters.

How many times in the Bible(s) is there a declaration of recursion/self-reference?

"And God said unto Moses, I AM THAT I AM" (Exodus 3:14)

"But, beloved, be not ignorant of this one thing, that one day is with the Lord as a thousand years, and a thousand years as one day." (2 Peter 3)

Wow interesting. Thank you. Any other interesting ones ? How do you make sense of these?

> a virtual town where AI characters live, chat and socialize

The description says "live", "chat" and "socialize", but I only saw "chat". What exactly does "living" and "socializing" mean in this context?

I think by live it means exist and by socialize it means they walk around to other people.

How long until the agents recreate AI Town in-universe? We wouldn’t be able to see it, but one of them could start talking about it. ;)

Here's a chat message from a random character in the demo I clicked on:


8/15/2023, 1:53:43 AM

Absolutely! Here's a glimpse of my latest masterpiece. [Attaches a photo of the painting] What do you think?


I feel like it will be difficult in general to prompt the LLM in a way that gets it to stick to the limits of the simulation environment.

One solution could be to just extend those limits?

What’s a limit?

To be crass and direct,

> The goal is to democratize building your own simulation environment with AI agents. Would love to see the community build more complex interactions on top of this. Let us know what you think!

why? what's the point?

Cooperative agents have a lot of utility in more applied settings and the interesting part here is the agents are autonomous agents operating over an abstract semantic space using LLMs unique ability to “reason” abductively. The key to my mind in these exercises is the construction of constraints, management of context, and delegation to task based optimizers (i.e., the agent has a goal, the LLM abstractly constructs a solution given the context, and optimizers like path finders carry it out)

I’ve thought a lot that these sorts of techniques can be used to make things much more generalized. A multimodal model backing say manufacturing equipment can make the equipment much easier to reconfigure to new tasks in an environment that’s more adhoc and includes other machine and human agents.

The key to my mind is the fact it’s not just LLMs it’s a mixture of techniques - constraints, optimizers, information retrieval, etc.

More specifically the demo involves a lot of Andreessen Horowitz investments to demonstrate them in an accessible way that people find interesting because it’s a game. And people like games.

Thanks! Can you please elaborate on "Cooperative agents have a lot of utility in more applied settings?"

Imagine being in your shop and telling a robot to get you a hammer. It can understand the instructions, look at the environment using an inverse image model, synthesize the environment with the instruction, delegate to path finders etc a route to the tool box and retrieve you a hammer without ever being programmed anything. Now abstract you to being other machine agents in the environment of a workshop being instructed goals along some plan. Today factory robots are highly specialized machines that require a very finite and controlled problem space and reconfiguration is expensive and time consuming, if possible at all.

Thank you very much @fnordpiglet for addressing my questions directly and thoroughly! Makes a lot of sense. I still don't understand why a16z wants us to do that though. what's in it for them?

They’re showcasing a bunch of portfolio companies in their integrations. That’s why the setup is 300 steps long.

The mission statement you quoted is incredibly clear. Very often both commercial and open source projects announced here on hacker news have completely impenetrable descriptions or mission statements. That is not the case here. I literally could not improve on it without quoting it directly.

What is unclear about the portion you quoted?

why did they make this? what is the purpose? why do they want "to see the community build more complex interactions on top of this" - what's in it for them?


And maybe can be used to do interesting things.

Let’s work on the cool tech together.

I promise you it will not hack into the mainframe and turn the world into paperclips. (Odds are…near zero).

I'm making a conscious effort to do more things that have no extrinsic objective, to do things just for their own sake. This is a familiar concept, it's a hobby (or art?). I feel I'm missing some of the joy and satisfaction that life has to give by always needing a stated tangible objective.

This is made by a16z, ostensibly for business reasons. I am asking what their angle is.

As far as I know, they don't have much history in making things with "no extrinsic objective" - I could be wrong though.

You misunderstand what a16z-infra is. This is built by Convex, the game engine being used.

Convex is in a16z’s portfolio. https://a16z.com/2022/04/27/investing-in-convex/

That GitHub org is a16z.

Their other repos are also showcasing various AI toolings, often the tools are also startups they invest in.


This is fantastic, will try it out this week! Thanks!

This is great! How do i self host it (no open ai)?

I think this is the rabbit hole you're looking for, https://twitter.com/RLanceMartin/status/1690829179615657985

note you still need pinecone (vector db) and something called clerk.com that does auth, not sure why, seen other projects swap out pinecone for chroma, and not sure why we are authorising things with clerk.com, but that cannot be essential...

Could lead to cool sim games.

I’d love a modern zoo tycoon. I enjoyed that game as a kid. Or roller coaster tycoon.

There you go, hopefully y’all make some money. Just send me the games you help to make on steam lol.

incredible job @ykhli! thank you for the fal.ai mention on the stack! https://github.com/a16z-infra/ai-town#stack

for anyone eager to generate their own simulation head to https://serverless.fal.ai/lora to create your own pixel art game characters

What happens after you trap them for certain generations, and all of the sudden give them access to the real world? Would their "minds" break?

What's it do if everyones a "hacker"...?

Probably something similar to this [0]

[0] https://www.youtube.com/watch?v=cL7lhbtWwbY

Now that's what I call a couple of excellent comments on HN.

Clicked on Tilly in cat town, and: "Tilly is a dog and likes barking because she is a dog."

Seems like it might need some adjusting :)

If you read a little further, it makes sense:

“Tilly is a dog and likes barking because she is a dog. Tilly pretends that she's a cat. (…)”

Yes, but the sprite image being displayed is still a cat, not a dog.

It's just a good disguise. Like Perrito from Puss in Boots.

Can't argue with that logic!

This looks brilliant. Something which has been on my mind for a long time, yet haven't had the time to work on it.

What does it mean for them to “live”? Are there certain goals they try to accomplish? Find food? A mate? Build families?

I would assume that, as the developer of an agent, you could create those goals for them.

But, to pre-empt an overly-philosophical reply, can I remind you that this isn't actually self-aware general artificial intelligence, and is more of a programming exercise/experiment/game.

We've been studying humans for thousands of years and are no closer to answering those questions for them. Why do you think we'd have an answer for these newfangled AI agents?

Responding to a question with another question doesn’t help the op who seemed to be genuinely asking an honest question about this AI simulation that this while thread is referring to. And second, I think we do know a whole lot more about ourselves than we do about AI if your question wasn’t supposed to be rhetorical

> a virtual town where AI characters live, chat and socialize.

Is it meant to be like a zoo, where humans gawk at other creatures?

You can interact with them, in the original paper, at least.

I doubt they’ll be interesting.

This is great. I love how weird some of the characters can be, just like in real life. For example:

Pete is deeply religious and sees the hand of god or of the work of the devil everywhere. He can't have a conversation without bringing up his deep faith. Or warning others about the perils of hell.

Kurt has something to hide. It obsesses him and colors everything he says. He's so afraid someone will figure out that he is obviously evasive. He'll never tell anyone the secret, but he'll ellude to it alot. It tortures him. And his life has become a mess as a result of it.

Stella can never be trusted. she tries to trick people all the time. normally into giving her money, or doing things that will make her money. she's incredibly charming and not afraid to use her charm. she's a sociopath who has no empathy. but hides it well.


To take this to the next level, I hope you would be able to prompt your own characters, and perhaps have places you can send these guys on holiday to converse with other people's characters.

Also, I think this would be great as a tool for learning foreign languages. Just because it's interesting, engaging and based on language. Again with prompts that can be programmed, like Gill who constantly talks about his job in marketing, and Bill who likes to refer to himself in the third person, and Betty who constantly uses conditionals in her sentences. Again, just so cool.

Irks me when people shoe-in the word "democratize".

The etymology of the word implies rule by a demos (demos / kratos), a demarcated ingroup of people such as the citizenry of a country. And yet, so many of these 'democratising' technical endeavours keep control of the project strictly behind lock and key. There's nothing democratic about it - they just want popularity.

Is 'popularise' an undesirable word now? I wonder why the euphemism treadmill has had to advance in this way.

Further etymology is related to “to divide, cut up” so portions of people. Parts of a whole. Also see: demon, daemon, daimon for a kick.

The usage of words "live, chat and socialize" is not better. This is just rectangles moving in random directions and generating something that looks like a small talk conversation when any 2 rectangles collide (except when ChatGPT shits out several screens of text instead of a normal replica).

I looked at the demo and apparently everyone is a sociopath? One character is literally described as "a sociopath who has no empathy". So it might be a good idea to check up on them every once in a while.

SerAIel killer...?

Congrats @ykhli! Excited to check it out.

Maybe I'm weird, but I can imagine this being 10x more fun with a completely uncensored and toxic LLM backend :)

Rimworld 2.0

They are truly boringly nice.

And you can…

If an a16z repo gets enough stars, will they invest in themselves?

Fund of funds making funds where market research generates revenue and everyone’s adventures are capitalized into an on chain Large Lifestyle Model where being Day One is to emit content enough past the zeitgeist to avoid banality but not so far as to be incomprehensible.

cells within cells within cells within cells...

How else would they have “skin in the game”?

Not only that this looks like a toy.

And not a great one ; the sprites walk into the sea.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact