Hacker News new | past | comments | ask | show | jobs | submit login
Deep Learning for Procedural Content Generation – a survey (arxiv.org)
118 points by homarp 13 days ago | hide | past | favorite | 18 comments

There have been a few attempts at communicating deep learning methodology for procedural generation specifically within the roguelike community to use a specific example. While there is typically some high-level interest, the interest quickly fades and developers go back to using traditional procgen methods.

One of the hangups that I've observed is that many of the introductions for procgen deep learning (and deep learning in general) use SL, USL, and AL which require datasets containing many examples. The (time) cost associated with gathering or creating these examples is not appealing to procgen devs. Procgen devs continue using the procgen equivalent of symbolic AI. RL in procgen is largely avoided for the same reasons that RL is rare in other domains.

Despite this, I believe there is a connection between the fields through the lens of optimization problems. Typically procgen practitioners have a handful of parameters which they hand-tune to arrive at their desired results. The number of parameters is kept low so as not to exceed the cognitive capabilities of the practitioner. I'm a believer that by turning many of the current discrete procgen algorithms into continuous-valued generators, the number of generator parameters can be increased a thousand or million-fold so long as an appropriate loss function can be crafted (in practice this isn't very hard and proxies can even be used in tricky cases). In a lot of ways this becomes a reparameterization in a way that makes for more salient generator parameters.

For me one path forward is crafting continuous versions of existing discrete generators and using autodiff tools like JAX to optimize procgen parameters. This whole rant is pretty specific to the roguelike domain and probably doesn't carry over well to other spaces. Huge YMMV disclaimer.

I'm skeptical that machine learning has much to offer the typical roguelike developer.

Most roguelikes approach a player/developer ratio of zero. For every successful shipped roguelike with actual players, there are probably a thousand hobby roguelikes with no users but a happy developer or two wiling their time away. This is not a particularly lucrative corner of the gamedev world, so most are in it for the intrinsic motivation.

Handcrafting a level generator is a lot of fun. Training an ML model might be fun too, but it seems like a very different sort of activity, and not necessarily one that appeals to roguelike devs.

> so long as an appropriate loss function can be crafted (in practice this isn't very hard and proxies can even be used in tricky cases).

I'm skeptical that this is as easy as you suggest. Everyone thinks making something "fun" is easy until they sit down to try. Fun is the knife-edge between familiarity and surprise. Optimizing for it is probably in the same order of difficulty as predicting the stock market: If you train your generator to be particularly fun in a certain way, players will quickly grow tired of it, which isn't fun.

> Fun is the knife-edge between familiarity and surprise.

If we can autogenerate infinite fun, the human race is doomed.

I gave a talk on procgen last year where one of my slides attempted to sum things up thusly:

> In my experience good procgen is the creative application of boring algorithms. Your key weapons are imagination and iteration at the design stage - but the actual code tends to be simple, and fancier algos don't make your content any better.

> In contrast with ML, you live or die by how magical your data and algorithms are, but you have little direct control over the output so it's not a very creative process. You tend to pull the levers and hope that what comes out will be suitable.

To me that difference is why ML tends not to work well for procgen problems. With creative tasks we tend to have top-down design constraints that we want to impose (e.g. a minecraft village needs to fulfill various requirements for the game engine to treat it as a village), and the task is to make content "interesting" while meeting those constraints. With procgen this is straightforward - you start with the constraints, and creatively iterate on the "interesting". Whereas with ML you often get interesting for free, but if your output doesn't fulfill this or that constraint, it's not very obvious how to proceed.

Perhaps a dumb question from one fairly unschooled in ML: Can't ML be used to augment traditional procgen methods to create unique ornamentation and art styles on top of traditionally procedurally generated level layouts?

IMO, despite how interesting procgen levels themselves can be, they ultimately suffer from "sameness" - similar looking building facades, trees, stonework, etc. become boring fast. Couldn't one create separate training sets such as plants in varying biomes, unique architectural styles, and divergent cultural artwork, and then apply these sets to create new textures and primitives for decorating levels? This would keep different levels/environments/dungeons/whatever feeling fresh while saving on the laborious work of drawing "stone texture #43" or having the same painting in every tavern in the land.

Obviously I'm thinking of graphical rogue-lites like One Must Fall or In Death, or RPGs with procedurally-generated elements, not classic text-based games like Rogue or Hack.

I think this is an insightful question and potentially a good idea. It's common practice to generate with GAN:s in several stages, elaborating each time on an initiation which is often cruder or vaguer.

I see no reason why principles like this https://dl.acm.org/doi/10.1145/3306305.3332370

or this https://github.com/EvgenyKashin/stylegan2-distillation

couldn't be used to create much richer environments, at least in terms of media.

Thank you for the links. These projects are fascinating, and precisely what I was thinking about. I wish I had more time, skill, and knowledge to delve into ML and GANs.

The problem here is that the differences have no meaning. Which is to say variation that is skin deep isn’t very interesting.

I would counter-argue on three fronts:

1. A lack of variation, even skin deep, is even less interesting than only skin-deep variation. That is often the norm in large "open world" environments.

2. Much of what we find most interesting about the world is, in fact, skin deep. Take humans for example; all fit the same mold, but the range of human variation is endlessly fascinating to artists and audiences alike. Similarly, the layout of the Prati region of Rome and the Marina/Cow Hollow district of San Francisco is not wildly different. However, both are distinct, worthy of visiting, and unlikely to be confused with each other.

3. Current methods of procedural generation, without AI/ML, already result in fairly interesting maps. Could "true" AI do better? Perhaps, but I'm not sure that the delta is so great that the results would be noticeably better than a well constructed procedural algorithm (play "Tea for God" if you have a VR HMD and a lot of space). I could be wrong though - the gold standard remains hand-placed, human-built worlds filled with little surprises. If an AI approach resulted in a level design competing with the best of bespoke worlds, particularly at a lower development cost, then I'm all for it.

1. That rather depends on the thing varying and whether it's important or not. Further variation can act to obscure rather than improve things that are interesting. For example having a thousand unique leaves under a tree is probably not interesting in a game where they are incidental. Likewise having a thousand slight variations of orc isn't really purposeful. And having a thousand very different looking creatures that all act the same is boring and confusing. Which is why I talk about meaning. For an example of proc-gen variation with purpose look at things projects like Ultima Ratio Regum.

2. I'm not sure these examples are skin deep to be honest in that I don't think it's just the visual variation that makes these comparisons interesting. There's some underlying meaning being glossed over.

For me the majority of AI generative/artistic projects just create facsimiles. They lack the spark of human input and often only really create mild interest when they get it through hand curation.

Yes I think the whole problem for procgen is that it's hard to find an optimisation problem to optimise related to it. if everything goes well with a "symbolic programming" there no reason to change.

Its almost

> if everything goes well with a "symbolic programming" there no reason to change.

Yes and no. The elephant in the procgen room is the "1000 bowls of oatmean" problem[1] ie: perceptual uniqueness. Many procgen practitioners want to maximize serendipity but don't see a path toward that. I expect that methods explored in academia will filter down to practitioners, but the speed at which this occurs is largely modulated by the number of people with a foot in each world and willing to communicate with both sides. The same kind of gap exists within programming language theory and design, though because it is a bigger field, it benefits from network effects to a greater extent.

[1] https://galaxykate0.tumblr.com/post/139774965871/so-you-want...

Well even the real world looks like oatmeal after 50+ countries...

Without having read the paper, I believe the approach forward should be inspired by GPT-3. If the AI responsible for procedural generation can look at a very large set of existing games, then it might be able to generate the content given a few examples (i.e. few-shot).

Interestingly, there's very little discussion of the Transformer architecture aside from the known text implementations like AI Dungeon 2 (using GPT-2/GPT-3).

There's no mechanical restriction from using it on text only (with OpenAI's ImageGPT being an extreme example), and it will be interesting to see how/if it competes with LSTMs for nontext domains.

It seems to not support going in the reverse though, e.g. giving a vector in a low dimensional space and inverse transforming to a hypothetical generated document. Closest to this is the context starting text but it's not the same.

To be clear, I'm describing using transformers as autoencoders

If it produces interesting content as good as or better than traditional methods or has the same level of ease as traditional methods, then sure why not, otherwise why?

In the end, the results are what's important. It's cool to have fancy, new ways of doing things, but if something works and the new way isn't significantly better or easier than traditional ways and ends up being out of reach or significantly different than traditional wqys for little to no gain then, yeah, it's not likely to be widely adopted.

That all being said, it wouldn't surprise me if some day, some enterprising deep learning fan does come up with some new interesting or more efficient thing that sparks a new line of research and innovation.

It's the way with a lot of things, the early adopters and enthusiasts pave the way for the others.

I tend to regard a lot of machine learning, deep learning, ai kind of stuff as overblown, buzz wordy, euphemisms for fancy algorithms with internal memory and data analysis and manipulation capabilities.

But, that doesn't mean there aren't still innovative things that can be done with it.

Likely, someone will come along at some point with something easily accessible made with deep learning that revolutionizes procedural generation and helps in creating more realistic random environments. We're not there yet though, hence the general disregard for it.

I'd completely disagree. For me the most exciting thing about procedural content generation is that it can give you some insight into yourself or the content by examining the algorithm used to generate it. Deep learning looses this.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact