It’s the fundamentals that underly Stable Diffusion, Dalle, and various other SOTA image generation models, video, and audio generation models. They’ve also started taking off in the field of robotics control [1]. These models are trained to incrementally nudge samples of pure noise onto the distributions of their training data. Because they’re trained on noised versions of the training set, the models are able to better explore, navigate, and make use of the regions near the true data distribution in the denoising process. One of the biggest issues with GANs is a thing called “mode collapse” [2].
[1] https://www.physicalintelligence.company/blog/pi0
[2] https://en.wikipedia.org/wiki/Mode_collapse