Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thanks, good to know. Perhaps it is different for diffusion; with llms, layers are generally split across gpus, meaning inference has to happen on one gpu before the values can be passed between the layer split.


That's only if your model is too big for a single GPU and you're not batching.


Yes, that's what I was doing. Thanks for the info.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: