Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

And they seem to be about 10x as fast as similar sized transformers.


No, 10x less sampling steps. Whether or not that means 10x faster remains to be seen, as a diffusion step tends to be more expensive than an autoregressive step.


If I understood correctly, in practice they show actual speed improvement on high-end cards, because autoregressive LLMs are bandwidth limited and do not compute bound, so switching to a more expensive but less memory bandwidth heavy is going to work well on current hardware.


The SEDD architecture [1] probably allows for parallel sampling of all tokens in a block at once, which may be faster but not necessarily less computationally demanding in terms of runtime times computational resources used.

[1] Which Inception Labs's new models may be based on; one of the cofounders is a co-author. See equations 18-20 in https://arxiv.org/abs/2310.16834




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: