Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Has anybody explored diffusion models as a basis for LLMs?
1 point by Mockapapella on May 31, 2023 | hide | past | favorite | 3 comments
If you think of the characters as pixels, then you should be able to apply a similar process, right?



It is, as always, not quite that easy, because what is the equivalent of smooth continuous Gaussian noise for an ASCII character? What does a letter like 'z' jitter to/from?

But here is a bibliography of some relevant papers on diffusion models for discrete data which you might find useful: https://gwern.net/doc/ai/nn/diffusion/discrete/index


Thanks, I'll look into it.

To be clear, I'm making the association of an image is to a phrase as a pixel is to a character. In your example the letter 'z' would jitter between random unicode characters. Not sure how to deal with the problem of "the letter is close to right but not quite". That's just fine for images which don't always have to be perfect, but if you misspell "hello" as "heoko" then it is incredibly noticeable.

With the image generators, they are able to take a relatively small text prompt and create an image that is many bytes larger than the input. My idea would follow the same structure, in that for a small text prompt, the LLM would generate a large amount of random noise that would be resolved into something resembling the input prompt. I think some form of RLHF would be needed to make it easier to use and be more useful, but beyond a single `prompt:response` format I don't know how I could turn it into something resembling modern chat bots.


> With the image generators, they are able to take a relatively small text prompt and create an image that is many bytes larger than the input.

Yes, but that's not the hard part. The diffusion models in question work fine without any kind of text supervision, modeling purely the image. We also have plenty of text embeddings, so also not the hard part. The hard part is, how do you have a model which takes a large amount of 'random noise' (what even is 'random noise' in text?) and then make a tiny update which makes it more sensible such that it can diffuse towards high-quality text? Once that's solved, then control or conditioning is relatively easy.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: