It says in a shy way that it is based on:
"Ha & Schmidhuber (2018) who train a Variational Auto-Encoder (Kingma & Welling, 2014) to encode game frames into a
latent vector"
So it means they most likely took https://worldmodels.github.io/ (that is actually open-source) or something similar and swapped the frame generation by Stable Diffusion that was released in 2022.