This is a crazy paper. A first-generation diffusion model is beating LLama 3 in some areas, a model with a huge amount of tuning and improvement work. And it's from China again!
A whole new "tree" of development has opened up. With so many possibilities - traditional scaling laws, out-loud chain of thought, in-model layer-repeating chain of thought, and now diffusion models - it seems unlikely to me that LLMs are going to hit a wall that the river of technological progress cannot flow around.
I wonder how well they'll work at translation. The paper indicates that they're rather good at poetry.
A whole new "tree" of development has opened up. With so many possibilities - traditional scaling laws, out-loud chain of thought, in-model layer-repeating chain of thought, and now diffusion models - it seems unlikely to me that LLMs are going to hit a wall that the river of technological progress cannot flow around.
I wonder how well they'll work at translation. The paper indicates that they're rather good at poetry.
Interesting times.