Hacker News new | past | comments | ask | show | jobs | submit login

It's obviously a productive change and kudos for taking it on, but much of the enthusiasm being generated here was driven by the entirely unanticipated prospect of running a model at full speed using less memory than the model's own footprint, and by the notion that inference with a dense model somehow behaved in a sparse manner at runtime. Best to be a bit more grounded here, particularly with regard to claims that defy common understanding.



I wanted it to be sparse. Doesn't matter if it wasn't. We're already talking about how to modify the training and evaluation to make it sparser. That's the next logical breakthrough in getting inference for larger models running on tinier machines. If you think I haven't done enough to encourage skepticism, then I'd remind you that we all share the same dream of being able to run these large language models on our own. I can't control how people feel. Especially not when the numbers reported by our tools are telling us what we want to be true.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: