Hey folks, I’ve been working on using control-net to take in a video game level (input as a depth image) and output a beautiful illustration of that level. Play with it here: dimensionhopper.com or read the blog post about what it took to get it to work. Been a super fun project.
I wonder how far off we are from something like the battle school game from the book Ender's Game. That is, an immersive video game that uses player actions, choices, exploration etc. in order to generate not only new content, but entirely new game rules on the fly. It feels like we're getting closer and closer to Ender's holographic terminal with VR interfaces + AI content.
It’s not that hard to get these LLMs to generate new game rules, I’ve made some prototypes that use them to do that. The hard part is getting them to generate rules that are actually fun to play and somewhat balanced. The fact is that it’s hard to know if something is fun until you actually play it.
we are building something like that but the images are sadly static per story page and it has no psychological analysis. I've been interested in adding pupil dilation as a measure of cognitive load as a signal on one of the things you read but i think customers would find it too creepy.
With Apples new Vision Pro, the ability for it to do iris scans and also view the content in VR, your idea might actually be very interesting, with user consent of course.
While I have folks attention, I want to try training a model to generate monster/creature walk animations. Anyone know of a dataset of walk cycle sprite sheets that I could massage and label to see if I can make that work?
I have an idea for you to try - instead of training a model to produce subsequent animation frames (which is tough), instead, take a model trained on pixel art sprites in general, and then use a ControlNet with the input to the ControlNet being either a pose model or a higher res 3d model of a generic dummy character made in blender - and then generate output frame by frame, keeping the input prompting the same, but moving the ControlNet input frame by frame.
To get it down to small pixeled 'sprite' scale, the right thing may be to actually output 'realistic' character animation frames this way, and then 'de-res' them via img2img into pixel art. The whole pipeline could be automated so that your only inputs are a single set of varied walking/posing/jumping control net poses and the prompts describing the characters.
There are a lot of sprites to work with. As I'm sure you're aware, there are artists known for making animations, like Pedro Medeiros; spriters-resource.com has material from thousands of games; you can buy the Unity Asset Store, itch.io and stock art pixel art assets; and you can use DevX Tools Pro to extract assets from hundreds of 2D pixel art Unity games. All told, there are maybe 100,000-1m examples of high quality pixel art you can scrape. It is additionally possible that it already exists in the major crawls and needs to be labeled better.
A few people have tried training on sprite sheets and emitting them directly, and it did not work.
A few people have been working specifically on walking cycles, and it has a lot of limitations.
In my specific experience with other bespoke pixel art models, if you ask for a "knight," you're going to get a lot of the same looking knight. Fine-tuning will unlearn other concepts that are not represented in your dataset. LORAs have not been observed to work well for pixel art. You can try the Astropixel model, the highest quality in my opinion, for prototyping.
Part of this is you're really observing how powerful ControlNet, T2I-Adapters and LORAs are and you may have the expectation that something else you, a layperson, can do will be similarly powerful. Your thing is really cool. But is there some easy trick without doing all this science, for animation? No. Those are really big scientific breakthroughs, and with all the attention on video - maybe 100-1,000 academic and industry teams working on it - there still hasn't been something super robust for animation that uses LDMs. The most coherent video is happening in with NeRF, and a layperson isn't going to make that coherent with pixel art. Your best bet is to wait. That said, I'm sure people are going to link here to some great hand-processed LDM videos, and maybe there's a pipeline with hand artwork a layperson can do today that would work well.
That seems counter to the current AI mentality though. Clearly if it's online and available it's part of your AI playground so go nuts.
Edit: In case my sarcasm isn't clear, I hate this mentality and I am just bitterly griping about AI into the void. You should definitely ask permission to use data before training AI on it, but that will put you behind other AI people who aren't asking permission
I feel like Jump 'n Bump probably has a special place in the hearts of people who had access to the internet at a particular time. The internet was available, but multiplayer online gaming was still out of reach for many with it - there was an amazing niche of fairly polished indie local multiplayer games. Imagine being told while playing it back then what would be possible a few decades later.
amazing advances this year. Remember the guy who created the 2d platformer thats based on time, what was it called again? He spent around $100k+ just for the art, which I am pretty sure was a huge expenditure for him, with this software he could have done it virtually for free without much artistic talent at all.
The sad thing though is that $100k on art wasn’t wasted. It allowed an artist to make art and a living. I’m down the with tech and I’ve even written a typing game that generates Minecraft stories with a LLM and imagines them with stable diffusion for my daughter to learn typing with. But - my mom is an artist and she spent her career starving between sales and commissions. The fact the models have ingested their work and careers and can now replace them is sad.
On the other hand, I’ve never been an artist myself. So I’ve never been able to make my game ideas come true until now. The world is much more open to me in a creative side that my mechanical skills prevented.
Artists will continue to make art because it’s a compulsion. But I wish we had a world that was less oriented towards rewarding meaningless toil and would at least allow our born artists, writers, and creators the chance to do their obsessions to our benefit. Especially as we move post scarcity, I hope we can build a WPA like entity - perhaps, in a crazy twist, funded by AI?
We're never moving post scarcity. There is no scenario where that outcome is going to occur. Logistics alone guarantee that won't happen. Existence by its very nature is bound up in scarcity. There will always be various critical elements that provide a scarcity restriction on humanity and the rest (the less scarce) will always collide against that scarcity bottleneck.
The artists can now create on the tech side too, courtesy of inbound GPT-like LLMs. This isn't a one way street. The techies can craft art, the artists can craft tech.
It's opening up enormous pathways whether you're a programmer or artist. The artist has to be willing to expand and take on more responsibilities, just as the techie does if they want to craft quality AI art for a game.
With all the various game engines available now, we're not far away from being able to relatively easily have an LLM build nearly all the software side for you via prompting. From there you can bring whatever your strength is to customizing, implementing the game. Maybe you're good at ensuring high quality gameplay, maybe you're an artist that has an elite eye for how things should look, maybe you're a programmer and your game will be better optimized (and so on).
We already are with digital goods, society just hasn't caught up yet. I can make essentially infinite many copies of, say, Braid, and give every person in the world with an Internet connection a copy of it for a couple thousand dollars by using Cloudflare unlimited bandwidth R2 and bittorrent. A couple thousand dollars is basically a rounding error in the scheme of things. As I am not Jonathan Blow, distributing Braid would be a violation of copyright law, but copyright law is just a social contract that we entered into to incentivize the creation of work. If Jonathan Blow were compensated for every copy of Braid out there, I'm sure he would be quite happy to be (even more) rich.
So even in a post-digital-scarcity world, artists and programmers need to get paid, and so we have various DRM schemes, the first of which is the copyright system in the first place, but that works about as well as trying to make water not wet. Movies are leaked onto torrent sites like Rarbg (RIP) and people make copies all day long. libgen mirrors are still around despite the best efforts of the copyright regime. But let's be honest with ourselves, digital goods themselves are already post scarcity, we just haven't figured out how to incentivize the creation of works in our half-post-scarcity world and have no idea on how to move forwards.
Alternate solutions are out there, but we have no experience as a society in upending large social contracts (like copyright). You can easily imagine a system where what's popular gets tracked, and money flow to the creators of the media that people are actually watching and consuming. It would be a more draconian system than the DRM we have right now, but on the other hand, if it promotes the arts, then maybe it's worth it.
Post scarcity doesn’t mean “I dream of a palace and it appears.” It means basic necessities of life are available to all without everyone laboring. I think we are actually there, but have induced a scarcity economy with extreme imbalances and fake incentives to toil for toils sake.
The artist will make art without reward. They always have. But few can do it for a living. Fewer will in this next phase.
> Post scarcity doesn’t mean “I dream of a palace and it appears.” It means basic necessities of life are available to all without everyone laboring. I think we are actually there, but have induced a scarcity economy with extreme imbalances and fake incentives to toil for toils sake.
To be clear, that is according to the definition you've made up. The commonly understood definition includes many or most of peoples' desires. Travel, housing, vacations, iphones, advanced health care, entertainment, higher education, etc.
> I think we are actually there, but have induced a scarcity economy with extreme imbalances and fake incentives to toil for toils sake.
Even in western countries, millions of people are struggling with basic costs of housing, energy, and even food. I mean people who actually work many hours every day and earn money are struggling with these things. We seem to be a long way off it even for your "basic necessities" definition.
Post-scarcity does not mean that scarcity has been eliminated for all goods and services but that all people can easily have their basic survival needs met along with some significant proportion of their desires for goods and services.
> Artists will continue to make art because it’s a compulsion. But I wish we had a world that was less oriented towards rewarding meaningless toil and would at least allow our born artists, writers, and creators the chance to do their obsessions to our benefit.
Everybody is an artist, and everybody can create and share art for themselves and the people around them. That is art at is best, in my opinion. Commercialization and mass production and display and distribution of "art" is what degrades it for me.
> Especially as we move post scarcity, I hope we can build a WPA like entity - perhaps, in a crazy twist, funded by AI?
Post-scarcity will never happen because even the people with mansions in many countries, billion-dollar yachts, and fleets of private airplanes do not have enough. And 10 billion people can't even live like them. Why would the people who own most of the capital suddenly collectively decide they would like to share the production with the rest of the world?
But if it did happen, we should not pay artists for art. We should give everybody enough so that if anybody wanted to paint a picture or hammer some metal into a horseshoe or knit a sweater or write some assembly code, they could do so.
> But - my mom is an artist and she spent her career starving between sales and commissions.
Why should we care about your mum, and not the guy who creates buggy whips?? Or the guy who runs the ferry that crosses the river who will be out of their job because of the new bridge?
It feels like a lot of people on the pro-AI side of this debate think making art is some kind of magical activity that is totally inaccessible to them. The reality is the barrier to entry is a pen and paper and 10 minutes a day. There are a ton of resources online to learn about how to produce art without surrendering all your creativity to the AI slot machine, that don't involve stealing 400 million images.
I could glibly say the same about programming but it’s just not true. Anyone can learn to write a script to do some simple task, but they are not a programmer. The ability to go from a vision to a high quality output that accurately represents your vision requires years of work for most people, in any field.
And yet, so ready are we to discard those millennia of earned experience in favor of statistically generated sludge. Instead of glibly saying the same about programming, we glibly dismiss the role of the artist in making art. In another comment on this thread, someone talks about how Jon Blow could have saved $100,000 on art for Braid if he had used an ai system like this, perhaps not understanding that the art of Braid is a critical component of what makes it Braid.
If Karma is real, games made with uninteresting AI art will fail as we generate a sea of low-effort sludge.
Yeah, that won’t happen. As with anything; by far and large most people don’t care. As long as it’s fun, people will play. That statically generated sludge looks often better than what most humans come up with and sure, some (very few) humans are still better, but not for very long. That said, the speed and price of using AI are so much better and if you have 0 talent, like me, you can iterate for days for pennies. There is no going back.
I’ve always practiced art, after all my mother is an artist. I simply don’t have the mechanical aptitude. I enjoy creating my art, but my art is entirely mine and has no aesthetic value to anyone else. In making a game, you need an ability to create a consistent style that speaks to a topic. I can’t do that despite all practice. I think I’m not alone.
I tried to learn; took years of courses. I cannot. I have no feel for it. AI, for me, is a lifesaver for game art and web design. In a pepsi test, I found no one (of our clients, which is all who matter for this for me, and that is for web design, not art) who can see the difference between human or ai, so that’s that then.
At the quality of the current output, I think players still easily differentiate between AI generated art and hand-created art. Maybe in future versions this will be less noticeable.
As a game dev, I think at this stage AI can be a helpful utility, but it does not replace a designer's touch for professionally looking games.
If these levels were used in a game I was playing, they would not certainly not stand out to me as AI generated. It's possible if I was specifically asked to try to figure out if they were AI then I would succeed, but even that I'm not sure of.
The AI stuff is a style though. I'm seeing it happening now in the art world, where the quirks of the model become part of the appeal of the work. Won't be long until a game with good enough mechanics comes along and blows up I think.
For better or worse, I agree with your sentiment, but that will probably change. Consider how many kinds of foods and clothing are mass produced; we often consider something made by hand to be precious, and even a higher value, but we have become accustomed to the tradeoffs for cheaper solutions. It may not be our generation, but it's conceivable future generations will be less inclined to differentiate as we do (if only based on the exposure to what this kind of art generation offers at an early age).
I'm not convinced this is true. The economics of cultural production are far more winner take all than the economics of food and clothing production. Higher quality work gets more of the limited attention in that economy. AI work is doomed to fail because of this dynamic.
Everything gets re-appropriated by art, from mpeg frame skips to messy bedrooms.
Many of the quirks of our technology, like audio distortion for example, quickly become key components of certain styles. I remember as a child growing up in a funny valley after the acceptance of analogue distortion but before the widespread adaption of digital distortion.
Right now I'm thinking of someone like James Gerde, where the frame-to-frame shifts of AI imagination are part of the aesthetic. I think it's only a matter of time before this effect is matched up with something that makes emotional sense, and then it will blow up.
Using it that way requires intention. Developing that intention is hard to do if all you ever do is pull the one armed bandit hoping for AI to produce what you want.
I was going to comment that that the contrast between the beautiful illustrations and red blood that violently explodes when you kill the opponent is pretty funny. But then I looked up the original Jump 'n Bump and it's just as gory, if not more! Good ol' 90s games.
Love to see a write up on your Hugging Face Diffusers experience, setting that up, what your dev cycle & stack look like, if you're hosting that server on a GPU cloud instance or what. Those kind of details are very interesting.
I'm on an EC2 instance. Most of the effort was on cuda/pytorch/pip nonsense. Once stable diffusion webui was working, diffusers worked basically out of the box, which was really nice. (Trickiest bit was figuring out that I needed to use their tool to convert my safetensor file and that the version of python I was using wasn't working with it for some reason). Stack is flask + gunicorn which was what chatgpt recommended (lol). I had a websockets version of the progress bar working on flask-socketio on my local machine but could never get the server version through nginx to work correctly. So eventually I just gave up and switched it to polling so I could launch.
Does the 2d data like platforms and hit boxes still match the input entered by the human? If yes, I wouldn't say this is using AI for level editing, this seems using AI for level artwork generation. Impressive nonetheless, just different.
HN's submission title ("Show HN: Stable Diffusion powered level editor for a 2D game") made me think of the former. Article title ("2D Platformer using Stable Diffusion for live level art creation") was more accurate to me.
I've recently tried using InvokeAI to apply a specific style as a texture mod for the original Max Payne, along with RTX Remix. Instead of making the textures "modern", I was attempting to mix them into a noir rendition, similar to Sin City, but less cartoonish. Unfortuantely, it was really hard to get InvokeAI to restrict to the UV boundaries, with data always leaking and not looking really good when rendered.
I'm also curious if anyone has made a level that worked particularly well/poorly or has a great custom theme (that maybe I should add to the dropdown) :)
The main one is that making the control-net depth input look like something helps a ton. You can creating levels that have more 'structure' (large flat platforms, platforms that line up with others) and levels that are more random and see that the structure works way better. I played around a lot with turning the control-net on and off at the beginning and end of generation, which seemed to help when I was playing in the webui but then I didn't immediately find the API in diffusers and the results I was getting were great so I didn't keep looking.
Depth is a useful parameter for controlnet, especially when you want really specific forms. I've found that it can hamper outputs because blank sections of solid color are interpreted as flat walls, when really I'm trying to make those parts ambiguous!
Sure, but I'm more interested in things that were just impossible before. You can hire a artist to illustrate a level, or have AI do it cheaper, but you can't have an artist illustrate a level the player made while they wait. I think there are whole play patterns that are possible because the cost and especially speed of creating the art are many orders of magnitude different.
I think even more interesting is generating entire styles and story lines that evolve infinitely and coherently based off of seeds. Players could even inject a concept - “steampunk” or “discworld” and an LLM could construct the story, with characters, and visual themes.
Several of the themes (including "alien jungle", which is my favorite) we're created by chatGpt. I totally want to try evolving the game in that direction.
I think this could be great for letting the players design custom weapons/Armor/spells very roughly and then using AI to convert it to something that looks good in the game
It appears to be a group of people who are more willing to believe in transdimensional travel being responsible for changes in their life than things happening as a result of their outlook, luck, or their own agency.
This is good for procedural generated 2D worlds. Think Hollow Knight, but expansive across infinite environments. Just randomly generate the control image and have the LLM generate the theme. Combine that with LLM generated lore and the possibilities are unlimited.
I have far more simpler (I imagine?) case already in mind, from the recent Cities discussion thread:
>> I would've expected at least a not grid-based zoning so that buildings on curves look more natural. All these empty pieces of land in between buildings look really bad and kind of force us to make grid cities. And that is not even an innovation, it was already present in the SimCity series. But some procedurally generated buildings for smooth corners and connecting buildings would be nice.
> Its hard to make assets that would work with every curve. When you see screenshots of nice cities like this, people are using mods to hand place assets with them clipping into each other to make a unified wall of buildings along the curve or corner.
ML generated building configurations for city builder games. Readily adaptable to any shape, and as a bonus can break up excessive repetition a bit. If you want to be ambitious, train a model on real-world aerial photos.
Would this require ML? I would think this could be accomplished with some rudimentary procedural generation. Divide the large space up into building sized plots, then generate a building to fit the space.
Yeah, in the map editor there is, in fact a random button that generates. I havn't gotten around to making sure that the random level is playable (and about 1 in 4 have unreachable areas) but that wouldn't be that hard to add. (I've been focused on the creative aspect of creating your own levels because right now that part is more fun).
There's no books man. This stuff is too new. But all the components are in place and exist.
This guy just demonstrated what's required to generate a theme, and it's not far off from extrapolating further from that in using a LLM to generate Lore and just some random maze generator to create the base control image.
Doubtful. The developer is really old school. Additionally he's already spent years on integrating his huge lore generator into the fabric of the game. An LLM will throw a wrench into the whole process.