I am not sure that a diffusion approach is all that suitable for generating language. Word are much more discrete than pixels.
Diffusion doesn't work on pixels directly either, it works on a latent representation.
reply
I am not sure that a diffusion approach is all that suitable for generating language. Word are much more discrete than pixels.