Hacker News new | past | comments | ask | show | jobs | submit login

Yes.

All video games are, by definition, interactive videos.

What I imagine you're asking about is, a typical game like Doom is effectively a function:

  f(internal state, player input) -> (new frame, new internal state)
where internal state is the shape and looks of loaded map, positions and behaviors and stats of enemies, player, items, etc.

A typical AI that plays Doom, which is not what's happening here, is (at runtime):

  f(last frame) -> new player input
and is attached in a loop to the previous case in the obvious way.

What we have here, however, is a game you can play but implemented in a diffusion model, and it works like this:

  f(player input, N last frames) -> new frame
Of note here is the lack of game state - the state is implicit in the contents of the N previous frames, and is otherwise not represented or mutated explicitly. The diffusion model has seen so much Doom that it, in a way, internalized most of the state and its evolution, so it can look at what's going on and guess what's about to happen. Which is what it does: it renders the next frame by predicting it, based on current user input and last N frames. And then that frame becomes the input for the next prediction, and so on, and so on.

So yes, it's totally an interactive video and a game and a third thing - a probabilistic emulation of Doom on a generative ML model.




Thank you for the further explanation, that’s what I thought in the meantime and intended to find out with my question.

That opens up a new branch of possibilities.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: