Yeah. If isn't doing this, then what could it be doing that is worth a paper? "r...

rvnx · 2024-08-28T16:44:46.000000Z

There is a hint in the paper itself:

It says in a shy way that it is based on: "Ha & Schmidhuber (2018) who train a Variational Auto-Encoder (Kingma & Welling, 2014) to encode game frames into a latent vector"

So it means they most likely took https://worldmodels.github.io/ (that is actually open-source) or something similar and swapped the frame generation by Stable Diffusion that was released in 2022.